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ETAPS Foreword 


Welcome to the 24th ETAPS! ETAPS 2021 was originally planned to take place in 
Luxembourg in its beautiful capital Luxembourg City. Because of the Covid-19 pan- 
demic, this was changed to an online event. 

ETAPS 2021 was the 24th instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each 
conference has its own Program Committee (PC) and its own Steering Committee 
(SC). The conferences cover various aspects of software systems, ranging from theo- 
retical computer science to foundations of programming languages, analysis tools, and 
formal approaches to software engineering. Organising these conferences in a coherent, 
highly synchronised conference programme enables researchers to participate in an 
exciting event, having the possibility to meet many colleagues working in different 
directions in the field, and to easily attend talks of different conferences. On the 
weekend before the main conference, numerous satellite workshops take place that 
attract many researchers from all over the globe. 

ETAPS 2021 received 260 submissions in total, 115 of which were accepted, 
yielding an overall acceptance rate of 44.2%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2021 featured the unifying invited speakers Scott Smolka (Stony Brook 
University) and Jane Hillston (University of Edinburgh) and the conference-specific 
invited speakers Isil Dillig (University of Texas at Austin) for ESOP and Willem Visser 
(Stellenbosch University) for FASE. Inivited tutorials were provided by Erika Abraham 
(RWTH Aachen University) on analysis of hybrid systems and Madhusudan 
Parthasararathy (University of Illinois at Urbana-Champaign) on combining machine 
learning and formal methods. 

ETAPS 2021 was originally supposed to take place in Luxembourg City, Luxem- 
bourg organized by the SnT - Interdisciplinary Centre for Security, Reliability and 
Trust, University of Luxembourg. University of Luxembourg was founded in 2003. 
The university is one of the best and most international young universities with 6,700 
students from 129 countries and 1,331 academics from all over the globe. The local 
organisation team consisted of Peter Y.A. Ryan (general chair), Peter B. Roenne (or- 
ganisation chair), Joaquin Garcia-Alfaro (workshop chair), Magali Martin (event 
manager), David Mestel (publicity chair), and Alfredo Rial (local proceedings chair). 

ETAPS 2021 was further supported by the following associations and societies: 
ETAPS e.V., EATCS (European Association for Theoretical Computer Science), 
EAPLS (European Association for Programming Languages and Systems), and EASST 
(European Association of Software Science and Technology). 
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The ETAPS Steering Committee consists of an Executive Board, and representa- 
tives of the individual ETAPS conferences, as well as representatives of EATCS, 
EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saar- 
briicken), Marieke Huisman (Twente, chair), Jan Kofron (Prague), Barbara König 
(Duisburg), Gerald Liittgen (Bamberg), Caterina Urban (INRIA), Tarmo Uustalu 
(Reykjavik and Tallinn), and Lenore Zuck (Chicago). 

Other members of the steering committee are: Patricia Bouyer (Paris), Einar Broch 
Johnsen (Oslo), Dana Fisman (Be’er Sheva), Jan-Friso Groote (Eindhoven), Esther 
Guerra (Madrid), Reiko Heckel (Leicester), Joost-Pieter Katoen (Aachen and Twente), 
Stefan Kiefer (Oxford), Fabrice Kordon (Paris), Jan Křetínský (Munich), Kim G. 
Larsen (Aalborg), Tiziana Margaria (Limerick), Andrew M. Pitts (Cambridge), Grigore 
Rosu (Illinois), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Lutz Schröder 
(Erlangen), Ilya Sergey (Singapore), Mariélle Stoelinga (Twente), Gabriele Taentzer 
(Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), 
Anton Wijs (Eindhoven), Manuel Wimmer (Linz), and Nobuko Yoshida (London). 

I'd like to take this opportunity to thank all the authors, attendees, organizers of the 
satellite workshops, and Springer-Verlag GmbH for their support. I hope you all 
enjoyed ETAPS 2021. 

Finally, a big thanks to Peter, Peter, Magali and their local organisation team for all 
their enormous efforts to make ETAPS a fantastic online event. I hope there will be a 
next opportunity to host ETAPS in Luxembourg. 


February 2021 Marieke Huisman 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


Welcome to the 30th European Symposium on Programming! ESOP 2021 was orig- 
inally planned to take place in Luxembourg. Because of the COVID-19 pandemic, this 
was changed to an online event. ESOP is one of the European Joint Conferences on 
Theory and Practice of Software (ETAPS). It is devoted to fundamental issues in the 
specification, design, analysis, and implementation of programming languages and 
systems. 

This volume contains 24 papers, which the program committee selected among 79 
submissions. Each submission received between three and five reviews. After an author 
response period, the papers were discussed electronically among the 25 PC members 
and 98 external reviewers. The nine papers for which the PC chair had a conflict of 
interest (11% of the total submissions) were kindly handled by Patrick Eugster. 

The quality of the submissions for ESOP 2021 was astonishing, and very sadly, we 
had to reject many strong papers. I would like to thank all the authors who submitted 
their papers to ESOP 2021. 

Finally, I truly thank the members of the program committee. I am very impressed 
by their insightful and constructive reviews — every PC member has contributed very 
actively to the online discussions under this difficult COVID-19 situation, and sup- 
ported Patrick and me. It was a real pleasure to work with all of you! I am also grateful 
to the nearly 100 external reviewers, who provided their expert opinions. 

I would like to thank the ESOP 2020 chair Peter Müller for his instant help and 
guidance on many occasions. I thank all who contributed to the organisation of ESOP- 
the ESOP steering committee and its chair Peter Thiemann as well as the ETAPS 
steering committee and its chair Marieke Huisman, who provided help and guidance. 
I would also like to thank Alfredo Rial Duran, Barbara Könich, and Francisco Ferreira 
for their help with the proceedings. 


January 2021 Nobuko Yoshida 
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Abstract. We consider the reachability problem for finite-state multi- 
threaded programs under the promising semantics (PS 2.0) of Lee et al., 
which captures most common program transformations. Since reachability 
is already known to be undecidable in the fragment of PS 2.0 with only 
release-acquire accesses (PS 2.0-ra), we consider the fragment with only 
relaxed accesses and promises (PS 2.0-rlx). We show that reachability 
under PS 2.0-rlx is undecidable in general and that it becomes decidable, 
albeit non-primitive recursive, if we bound the number of promises. 
Given these results, we consider a bounded version of the reachability 
problem. To this end, we bound both the number of promises and of 
“view-switches” , i.e., the number of times the processes may switch their 
local views of the global memory. We provide a code-to-code translation 
from an input program under PS 2.0 (with relaxed and release-acquire 
memory accesses along with promises) to a program under SC, thereby 
reducing the bounded reachability problem under PS 2.0 to the bounded 
context-switching problem under SC. We have implemented a tool and 
tested it on a set of benchmarks, demonstrating that typical bugs in 
programs can be found with a small bound. 


Keywords: Model-Checking - Memory Models - Promising Semantics 


1 Introduction 


An important long-standing open problem in PL research has been to define a 
weak memory model that captures the semantics of concurrent memory accesses 
in languages like Java and C/C++. A model is considered good if it can be 
implemented efficiently (i.e., if it supports all usual compiler optimizations and 
its accesses are compiled to plain x86/ARM/Power/RISCV accesses), and is 
easy to reason about. To address this problem, Kang et al. [16] introduced the 
promising semantics. This was the first model that supported basic invariant 
reasoning, the DRF guarantee, and even a non-trivial program logic [80]. 

In the promising semantics, the memory is modeled as a set of timestamped 
messages, each corresponding to a write made by the program. Each pro- 
cess/thread records its own view of the memory—i.e., the latest timestamp for 
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each memory location that it is aware of. A message has the form (z, v, (f,t], V) 
where x is a location, v a value to be stored for x, (f, t] is the timestamp interval 
corresponding to the write and V is the local view of the process who made the 
write to x. When reading from memory, a process can either return the value 
stored at the timestamp in its view or advance its view to some larger timestamp 
and read from that message. When a process p writes to memory location x, a 
new message with a timestamp larger than p’s view of x is created, and p’s view 
is advanced to include the new message. In addition, in order to allow load-store 
reorderings, a process is allowed to promise a certain write in the future. A 
promise is also added as a message in the memory, except that the local view of 
the process is not updated using the timestamp interval in the message. This is 
done only when the promise is eventually fulfilled. A consistency check is used 
to ensure that every promised message can be certified (i.e., made fulfillable) by 
executing that process on its own. Furthermore, this should hold from any future 
memory (i.e., from any extension of the memory with additional messages). The 
quantification prevents deadlocks (i.e., processes from making promises they are 
not able to fulfil). However, the unbounded number of future memories, that 
need to be checked, makes the verification of even simple programs practically 
infeasible. Moreover, a number of transformations based on global value range 
analysis as well as register promotion were not supported in [I6]. 


To address these concerns, Lee et al. developed a new version of the promising 
semantics, PS 2.0 PS 2.0 simplifies the consistency check and instead of 
checking the promise fulfilment from all future memories, PS 2.0 checks for 
promise fulfilment only from a specially crafted extension of the current memory 
called capped memory. PS 2.0 also introduces the notion of reservations, which 
allows a process to secure a timestamp interval in order to perform a future 
atomic read-modify-write instruction. The reservation blocks any other message 
from using that timestamp interval. Because of these changes, PS 2.0 supports 
register promotion and global value range analysis, while capturing all features 
(process local optimizations, DRF guarantees, hardware mappings) of the original 
promising semantics. Although PS 2.0 can be considered a semantic breakthough, 
it is a very complex model: it supports two memory access modes, relaxed (r1x) 
and release-acquire (ra), along with promises, reservations and certifications. 


Let PS 2.0-rlx (resp. PS 2.0-ra) be the fragment of PS 2.0 allowing only 
relaxed (r1x) (resp. release-acquire (ra)) memory accesses. A natural and funda- 
mental question to investigate is the verification of concurrent programs under 
PS 2.0. Consider the reachability problem, i.e., whether a given configuration 
of a concurrent finite-state program is reachable. Reachability with only ra 
accesses has already been shown to be undecidable [I], even without promises 
and reservations. That leaves us only the PS 2.0-rlx fragment, which captures the 
semantics of concurrent ‘relaxed’ memory accesses in programming languages 
such as Java and C/C++. We show that if an unbounded number of promises is 
allowed, the reachability problem under PS 2.0-rlx is undecidable. Undecidability 
is obtained with an execution with only 2 processes and 3 context switches, where 
a context is a computation segment in which only one process is active. 
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Then, we show that reachability under PS 2.0-rlx becomes decidable if we 
bound the number of promises at any time (however, the total number of promises 
made within a run can be unbounded). The proof introduces a new memory 
model with higher order words LoHoW, which we show equivalent to PS 2.0-rlx 
in terms of reachable states. Under the bounded promises assumption, we use 
the decidability of the coverability problem of well structured transition systems 
(WSTS) [13] to show that the reachability problem for LoHoW with bounded 
number of promises is decidable. Further, PS 2.0-rlx without promises and reser- 
vations has a non-primitive recursive lower bound. Our decidability result covers 
the relaxed fragment of the RC11 model (which matches the PS 2.0-rlx 
fragment with no promises). Given the high complexity for PS 2.0-rlx and the 
undecidability of PS 2.0-ra, we next consider a bounded version of the reachabil- 
ity problem. To this end, we propose a parametric under-approximation in the 
spirit of context bounding [93321 26]24)29]1]3]. The aim of context bounding 
is to restrict the otherwise unbounded interaction between processes, and has 
been shown experimentally in the case of SC programs to maintain enough 
behaviour coverage for bug detection [2429]. The concept of context bounding 
has been extended for weak memory models. For instance, for RA, Abdula et 
al. [I] proposed view bounding using the notion of view-switching messages and 
a translation that keeps track of the causality between different variables. Since 
PS 2.0 subsumes RA, we propose a bounding notion that extends view bounding. 

Using our new bounding notion, we propose a source-to-source translation 
from programs under PS 2.0 to context-bounded executions of the transformed 
program under SC. The challenges in our translation differ a lot from that in [I], 
as we have to provide a procedure that (i) handles different memory accesses r1x 
and ra, (ii) guesses the promises and reservations in a non-deterministic manner, 
and (iii) verifies that promises are fulfilled using the capped memory. 

We have implemented this reduction in a tool, PS2SC. Our experimental 
results demonstrate the effectiveness of our approach. We exhibit cases where 
hard-to-find bugs are detectable using a small view-bound. Our tool displays 
resilience to trivial changes in the position of bugs and the order of processes. 
Further, in our code-to-code translation, the mechanism for making and certifying 
promises and reservations is isolated in one module, and can easily be changed 
to cover different variants of the promising semantics. 

For lack of space, detailed proofs can be found in [5]. 


2 Preliminaries 


In this section, we introduce the notation that will be used throughout. 


Notations. Given two natural numbers i, j € N s.t. i < j, we use [i, j] to denote 
{k|i<k <j}. Let A and B be two sets. We use f : A > B to denote that f is 
a function from A to B. We define f[a++ b] to be the function f’ s.t. f'(a) = b 
and f’(a’) = f(a’) for all a’ # a. For a binary relation R, we use [R]* to denote 
its reflexive and transitive closure. Given an alphabet X, we use ©* (resp. X+) 
to denote the set of possibly empty (resp. non-empty) finite words (also called 
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simple words) over X. A higher order word over X is an element of (*)* (i.e., 
word of words). Let w = aja2---a, be a simple word over X, we use |w| to 
denote the length of w. Given an index i in [1,|w|], we use wfi] to denote the it! 
letter of w. Given two indices i and j s.t. 1 < i < j < |w], we use wiz, j] to denote 
the word a,a;41---a;. Sometimes, we see a word as a function from [1,|w|] to X. 


Program Syntax. The simple program- 


; ; 3 pa Prog ::= var x* (proc p||...||proc p) 
ming language we use is described in Fig- proc p ::= Reg(p) i* 
ure|1| A program Prog consists of a set EENS 
Loc of (global) variables or memory lo- 5 E St = 


skip |s;s |assume(z = e) 


cations, and a set P of processes. Each aee aoe da 
process p declares a set Reg (p) of (lo- |if e then s else s 
cal) registers followed by a sequence of la- [Sr := e |$r := z? |x? := $r 
beled instructions. We assume that these Loree EA DD Ne) 

‘ ae |$r := CAS°°(z,v,v)  |SC-fence 
sets of registers are disjoint and we use o € Mode ::= rlx|ra 


Reg := U,Reg (p) to denote their union. 
We assume also a (potentially unbounded) 
data domain Val from which the registers and locations take values. All locations 
and registers are assumed to be initialized with the special value 0 € Val (if not 
mentioned otherwise). An instruction i is of the form A: s where À is a unique 
label and s is a statement. We use L, to denote the set of all labels of the process 
p, and L = Unep Lp the set of all labels of all processes. We assume that the 
execution of the process p starts always with a unique initial instruction labeled 
by Anit 

A write instruction is of the form x° = $r assigns the value of register $r to 
the location x, and o denotes the access mode. If o = rlx, the write is a relaxed 
write, while if o = ra, it is a release write. A read instruction $r = x° reads 
the value of the location x into the local register $r. Again, if the access mode 
o = rlx, it is a relaxed read, and if o = ra, it is an acquire read. Atomic updates 
or RMW instructions are either compare-and-swap (CAS°’*”) or FADD °”, 
Both have a pair of accesses (0r, Ow € {rel, acq, rlx}) to the same location — a 
read followed by a write. Following [22], FADD(z,v) stores the value of x into a 
register $r, and adds v to x, while CAS(z, v1, v2) compares an expected value 
vı to the value in x, and if the values are same, sets the value of x to v2. The 
old value of x is then stored in $r. A local assignment instruction $r = e assigns 
to the register $r the value of e, where e is an expression over a set of operators, 
constants as well as the contents of the registers of the current process, but not 
referring to the set of locations. The fence instruction SC-fence is used to enforce 
sequential consistency if it is placed between two memory access operations. For 
simplicity, we will write assume(z = e) instead of $r = x; assume($r = e). This 
notation is extended in the straightforward manner to conditional statements. 


Fig. 1: Syntax of programs. 


3 The Promising Semantics 


In this section, we recall the promising semantics [22]. We present here PS 2.0 
with three memory accesses, relaxed, release writes (rel) and acquire reads (acq). 
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Read-modify-writes (RMW) instructions have two access modes - one for read 
and one for write. We keep aside the release and acquire fences (and subsequent 
access modes), since they do not affect the results of this paper. 


Timestamps. PS 2.0 uses timestamps to maintain a total order over all the 
writes to the same variable. We assume an infinite set of timestamps Time, 
densely totally ordered by <, with 0 being the minimum element. A view is a 
timestamp function V : Loc > Time that records the largest known timestamp 
for each location. Let T be the set containing all the timestamp functions, along 
with the special symbol L. Let Vinit represent the initial view where all locations 
are mapped to 0. Given two views V and V’, we use V < V’ to denote that 
V(x) < V'(x) for x € Loc. The merge operation L between two views V and V’ 
returns the pointwise maximum of V and V’, i.e., (V U V”)(y) is the maximum of 
V(y) and V’(y). Let Z denote the set of all intervals over Time. The timestamp 
intervals in Z have the form (f,t] where either f = t = O or f < t, with f,t € Time. 
Given an interval I = (f,t] € Z, I.frm and I.to denote f,t respectively. 


Memory. In PS 2.0, the memory is modelled as a set of concrete messages 
(which we just call messages), and reservations. Each message represents the 
effect of a write or a RMW operation and each reservation is a timestamp interval 
reserved for future use. In more detail, a message m is a tuple (x, v, (f,¢], V) 
where x € Loc, v € Val, (f,t] € Z and V € T. A reservation r is a tuple (x, (f, ¢]). 
Note that a reservation, unlike a message, does not commit to any particular value. 
We use m.loc (r.loc), m.val, m.to (r.to), m.frm (r.frm) and m.View to denote 
respectively x, v, t, f and V. Two elements (either messages or reservations) are 
said to be disjoint (mi#mz2) if they concern different variables (m1.loc 4 mg.loc) 
or their intervals do not overlap (m.to < m2.frmV mı.frm > mg.to). Two sets of 
elements M, M’ are disjoint, denoted M#M’, if m#m for every m € M, m’ € M’. 
Two elements m1, M2 are adjacent denoted Adj(m1, M2) if mı.loc = mg.loc 
and m.to = m2.frm. A memory M is a set of pairwise disjoint messages and 
reservations. Let M be the subset of M containing only messages (no reservations). 
For a location x, let M(x) be {m € M | m.loc = x}. Given a view V and a 
memory M, we say V € M if V(x) = m.to for some message m € M for every 
x € Loc. Let M denote the set of all memories. 


Insertion into Memory. Following [22], a memory M can be extended with a 
message (due to the execution of a write/RMW instruction) or a reservation m 
with m.loc = x, m.frm = f and m.to = t in a number of ways: 


A 
Additive insertion M < m is defined only if (1) M#{m}; (2) if m is a message, 
then no message m’ € M has m’.loc = x and m’.frm = t; and (3) if m is a 
reservation, then there exists a message m’ € M with m’.loc = x and m’.to = f. 
A 
The extended memory M + m is then M U {m}. 


S : 
Splitting insertion M 4> m is defined if m is a message, and, if there exists 
a message m! = (x,v',(f,t],V) with t < t in M. Then M is updated to 


M &m=(M\{m'} U {m, (x, v', (t, t'], V)}). 
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Lowering Insertion M & m is only defined if there exists m’ in M that is identical 
to m = (x,v,(f,t], V) except for m.View < m’.View. Then, M is updated to 
M &m=M\{m'}U{m}. 

Transition System of a Process. Given a process p € P, a state ø of p is 
defined by a pair (A, R) where A € L is the label of the next instruction to be 
executed by p and R : Reg —> Val maps each register of p to its current value. 
(Observe that we use the set of all labels L (resp. registers Reg) instead of Lp 
(resp. Reg (p)) in the definition of o just for the sake of simplicity.) Transitions 


between the states of p are of the form (A, R) = (X, R') with t is on one of 
p 


the following forms: €, ra(o, x, v), wt(o0, x, v), U(0;, Ow, £, Ur, Uw), and SC-fence. A 
transition of the form (A, R) bai (A', R') denotes the execution of a read 
p 


instruction of the form $r = x° labeled by A where (1) A’ is the label of the 
next instructions that can be executed after the instruction labelled by A, and 
(2) R’ is the mapping that results from updating the value of the register $r in 


R to v. The transition relation (A, R) = (A', R’) is defined in similar manner 
P 


for the other cases of t where wt(o,x,v) stands for a write instruction that 
writes the value v to x, U(Or, Ow, £, Ur, Uw) stands for a RMW that reads the 
value v, from x and write Vvu to it, SC-fence stands for a SC-fence instruction, 
and e€ stands for the execution of the other local instructions. Observe that 
0,Or,Ow are the access modes which can be rlx or ra. We use ra for both 


release and acquire. Finally, we use (A, R) = (\’, R'), with t 4 €, to denote that 
p 


(A, R) So S++ S on S ong $ o S (NR). 
P P Pp P p p 


Machine States. A machine state MS is a tuple ((J, R), VS, PS, M, G), where 
J: P => L maps each process p to the label of the next instruction to be executed, 
R : Reg + Val maps each register to its current value, VS = P — T is the process 
view map, which maps each process to a view, M is a memory and PS : PHM 
maps each process to a set of messages (called promise set), and G € T is the 
global view (that will be used by SC fences). We use C to denote the set of 
all machine states. Given a machine state MS = ((J, R), VS, PS, M, Œ) and a 
process p, let MS |p denote (o, VS(p), PS(p), M, G), with o = (J(p), R(p)), (i.e., 
the projection of the machine state to the process p). We call MS|p the process 
configuration. We use C, to denote the set of all process configurations. 

The initial machine state MS init = ((Jinits Rinit), VSinit, PSinit, Minit, Ginit) is 
one where: (1) Jinit(p) is the label of the initial instruction of p; (2) Rinit($r) = 0 
for every $r € Reg; (3) for each p, VS(p) = Vinit as the initial view (that maps each 
location to the timestamp 0); (4) for each p, the set of promises PSinit (p) is empty; 
(5) the initial memory Mjni; contains exactly one initial message (x, 0, (0, 0], Vinit) 
per location x; and (6) the initial global view maps each location to 0. 
Transition Relation. We first describe the transition (ø, V, P, M,G') F: 


(a’, V’, P', M',G') between process configurations in Cp from which we induce 
the transition relation between machine states. 
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Process Helpers 
m = («,-,(-,t],K)€M V(z)<t 
o=rlxsV’=V[rH t] 
(MEMORY : NEW) o=raSsV=V[UrH4tuk 


om 


V= V' 
rd 


Memory Helpers 


(P, M) 5 (r MÊ m) 


m= («,-,(-,t],K) E M, Ve) <t 


MEMORA MOLPIN G= mi IK = Iho ss AG) =A E e 


3 it 
< Efi 4 ae Pom M=Meom (P,M) = (P',M') V'=V|e= t] 
(P, M) 5 (P\{m}, M’) (V, P, M) 25 (V', P',M') 
Process Steps 
Read Write 
ra(o,z,o) ; wt(o,z,o) , 
> 0 ars 
p p 
m eE —), VV! m= E E S M) = (V', PM’) 
(0, V, P, M, G) > (0',V', P, M,G) (o, V, P, M, G) > (0', V’, P’, M’,G) 
P P 
SC-fence Promise 
SC-fence m= (-,-,(-,-], 4), 
+ Oo! A 
Fee M'=M m, K € M’ 
(o, V, P, M, G) = (œ, VUG,P,M,GUV) (e, V,P,M,G) > (o, V, P eS m, M',G) 
P 
Update 
g PEETI oh, my = (z,0r, (=,t], -); Mw = (2, vw, (t, —],—); 


P 


V a Va (V”,P,M) — (V', P’, M’) 
(0, V, P, M, G) > (0', V', P', M',G) 
P? 


Fig. 2: A subset of PS 2.0 inference rules at the process level. 


Process Relation. The formal definition of — is given in ae Below, we 
p 


explain these inference rules. Note that the full set of rules can be found in [5]. 
Read A process p can read from M by observing a message m = (x, v, (f,t], K) if 
V(x) < t (i.e., p must not be aware of a later message for x). In case of a relaxed 
read rd(rlx, x,v), the process view of x is updated to t, while for an acquire read 
rd(ra,x,v), the process view is updated to V[x > t] U K. The global memory 
M, the set of promises P, and the global view G remain the same. 


Write. A process can add a fresh message to the memory (MEMORY : NEW) or 
fulfil an outstanding promise (MEMORY : FULFILL). The execution of a write 
(wt(rlx, x,v)) results in a message m with location x along with a timestamp in- 
terval (—, t]. Then, the process view for x is updated to t. In case of a release write 
(wt (ra, z,v)) the updated process view is also attached to m, and ensures that 
the process does not have an outstanding promise on x. (MEMORY : FULFILL) 
allows to split a promise interval or lower its view before fulfilment. 


Update. When a process performs a RMW, it first reads a message m = 
(x, v, (f,t], K) and then writes an update message with frm timestamp equal to 
t; that is, a message of the form m’ = (x, v’, (t,t’], K’). This forbids any other 
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write to be placed between m and m’. The access modes of the reads and writes 
in the update follow what has been described for the read and write above. 


Promise, Reservation and Cancellation. A process can non-deterministically 
promise future writes which are not release writes. This is done by adding a 
message m to the memory M s.t. m#M and to the set of promises P. Later, a 
relaxed write instruction can fulfil an existing promise. Recall that the execution 
of a release write requires that the set of promises to be empty and thus it can not 
be used to fulfil a promise. In the reserve step, the process reserves a timestamp 
interval to be used for a later RMW instruction reading from a certain message 
without fixing the value it will write. A reservation is added both to the memory 
and the promise set. The process can drop the reservation from both sets using 
the cancel step in non-deterministic manner. 


SC fences. The process view V is merged with the global view G, resulting in 
V UG as the updated process view and global view. 


Machine Relation. We are ready now to define the induced transition relation 
between machine states. For machine states MS = ((J, R), VS, PS, M,G) and 
MS' = ((J', R), VS’, PS’, M',G’), we write MS —> MS’ iff (1) MS\p > 

p p 


MS}p and (J(p'), V S(p"), PS(p")) = (J'(p'), V S (p'), PS'(p')) for all p' # p. 


Consistency. According to Lee et al. [22], there is one final requirement on 
machine states called consistency, which roughly states that, from every encoun- 
tered machine state, all the messages promised by a process p can be certified 
(i.e., made fulfillable) by executing p on its own from a certain future memory 
(called capped memory), i.e., extension of the memory with additional reservation. 
Before defining consistency, we need to introduce capped memory. 


Cap View, Cap Message and Capped Memory. The last element of a memory 
M with respect to a location x, denoted by 7%yz,z, is an element from M(x) 
with the highest timestamp among all elements of M(x) and is defined as 
TiM,» = MaXme M(x) M.to. The cap view of a memory M, denoted by Pu, is the 


view which assigns to each location x, the to timestamp in the message mzy „+ 


That is, Vig = ALM tO. Recall that M denote the subset of M containing 
only messages (no reservations). The cap message of a memory M with respect 
to a location z, is given by MM, = (x, M57 ,-val, (Me-to, My,,-to + 1], Pu). 

Then, the capped memory of a memory M, wrt. a set of promises P, denoted 
by Mp, is an extension of M, defined as: (1) for every mı, m2 E€ M with 
my,.loc = mg.loc, m.to < mMms.frm, and there is no message m’ € M (m4 .loc) such 
that m.to < m’.to < m2.to, we include a reservation (m1.loc, (m1.to, m2.frm]) 
in M p, and (2) we include a cap message yy, in M, p for every variable x unless 


™M.x is a reservation in P. 


Consistency. A machine state MS = ((J, R), V S, PS, M, G) is consistent if every 
process p can certify/fulfil all its promises from the capped memory Mps,p), i.e., 
((J, R), VS, PS, MPs), G) [>]* ((J', R’), VS", PS’, M',G') with PS'(p) = 0. 

p 
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The Reachability Problem in PS 2.0. A run of Prog is a sequence of the 
form: MSo eo Mai [—]* MS2 [—>]* ines [oh Msn where MSo = MS init 
is the initial machine state and MS ‘i ...,;MS,, are consistent machine states. 
Then, MSo,...,MS,, are said to be reachable from MSjnit- 

Given an instruction label function J : P — L that maps each process p € P 
to an instruction label in Lp, the reachability problem asks whether there exists 
a machine state of the form ((J, R), V, P, M, G) that is reachable from MSjnit. 
A positive answer to this problem means that J is reachable in Prog in PS 2.0. 


4 Undecidability of Consistent Reachability in PS 2.0 


The reachability problem is undecidable for PS 2.0 even for finite-state programs. 
The proof is by a reduction from Post’s Correspondence Problem (PCP) [28]. A 


PCP instance consists of two sequences uj1,...,Un and v1,...,Un of non-empty 
words over some alphabet X. Checking whether there exists a sequence of indices 
jis--- jk E {1,..., N} st. uz, ... Uj, = vy, --.U,, is undecidable. Our proof works 


with the fragment of PS 2.0 having only relaxed (rlx) memory accesses and 
crucially uses unboundedly many promises to ensure that a process cannot skip 
any writes made by another process. We construct a concurrent program with 
two processes pı and pz over a finite data domain. The code of pı is split into two 
modes: a generation mode and a validation mode by a if and its else branch. 
The if branch is entered when the value of a boolean location validate is 0 (its 
initial value). We show that reaching the instructions annotated by // and // in 
Pı, p2 is possible iff the PCP instance has a solution. We give below an overview 
of the execution steps leading to the annotated instructions. 


— Process pı promises to write letters of u; (one by one) to a location x, and 
the respective indices 7 to a location index. The number of made promises 
is arbitrary, since it depends on the length of the PCP solution. Observe 
that the sequence of promises made to the variable index corresponds to the 
guessed solution of the PCP problem. 

— Before switching out of context, pı certifies its promise using the if branch 
which consists of a loop that non-deterministically chooses an index i and 
writes 7 to index and u; to x. The promises of pı are as yet not fulfilled; this 
happens in the else branch of pı, when it writes the promised values. 

— po reads from the sequences of promises written to x and index and copies 
them (one by one) to variables y and index’ respectively. Then, p sets 
validate to 1 and reaches //. 

— The else branch in pı is enabled at this point, where pı reads the sequence 
of indices from index’, and each time it reads an index i from index’, it checks 
that it can read the sequence of letters of v; from y. 

— pı copies the sequence of observed values from y and index’ back to x and 
index respectively. To fulfil the promises, it is crucial that the sequence of 
read values from index’ (resp. y) is the same as the sequence of promised 
values to index (resp. x). Since y holds a sequence 2, ...v;,, the promises 
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are fulfilled if and only if this sequence is same as the promised sequence 
Uj, ---Us,- This happens only when 721,...,7,% is a PCP solution. 
— At the end of promise fulfilment, pı reaches //. 


Our undecidability result is also tight in the sense that the reachability problem 
becomes decidable when we restrict ourselves to machine states where the number 
of promises is bounded. Further, our proof is robust: it goes through for PS 1.0 
[16]. Let us call the fragment of PS 2.0 with only rlx memory accesses PS 2.0-rlx. 


Theorem 1. The reachability problem for concurrent programs over a finite data 
domain is undecidable under PS 2.0-rlx. 


5 Decidable Fragments of PS 2.0 


Since keeping ra memory accesses renders the reachability problem undecidable 
[I] and so does having unboundedly many promises when having rlx memory 
accesses (Theorem |1), we address in this section the decidability problem for 
PS 2.0-rlx with a bounded number of promises in any reachable configuration. 
Bounding the number of promises in any reachable machine state does not 
imply that the total number of promises made during that run is bounded. Let 
bdPS 2.0-rlx represent the restriction of PS 2.0-rlx to boundedly many promises 
where the number of promises in each reachable machine state is smaller or equal 
to a given constant. Notice that the fragment bdPS 2.0-rlx subsumes the relaxed 
fragment of the RC11 model [20J16].We assume here a finite data domain. 

To establish the decidability of the reachability of bdPS 2.0-rlx, we introduce 
an alternate memory model for concurrent programs called LoHoW (for “lossy 
higher order words”). We present the operational semantics of LoHoW, and show 
that (1) PS 2.0-rlx is reachability equivalent to LoHoW, (2) under the bounded 
promise assumption, reachability is decidable in LoHoW (hence, bdPS 2.0-rlx). 


Introduction to LoHoW. Given a concurrent program Prog, a state of LoHoW 
maintains a collection of higher order words, one per location of Prog, along 
with the states of all processes. The higher order word HW, corresponding to 
the location x is a word of simple words, representing the sub memory M(x) 
in PS 2.0-rlx. Each simple word in HW, is an ordered sequence of “memory 
types”, that is, messages or promises in M(x), maintained in the order of their 
to timestamps in the memory. The word order between memory types in HW, 
represents the order induced by time stamps between memory types in M(z). 
The key information to encode in each memory type of HW, is: (1) is it a message 
(msg) or a promise (prm) in M(x), (2) the process (p) which added it to M(x), 
the value (val) it holds, (3) the set S (called pointer set) of processes that have 
seen this memory type in M(x) and (4) whether the adjacent time interval to 
the right of this memory type in M(x) has been reserved by some process. 


Memory Types. To keep track of (1-4) above, a memory type is an element of 
XUT with, X = {msg, prm} x Val x P x 2P (for 1-3) and I = {msg, prm} x Val x 
P x 2? x P (for 4). We write a memory type as (r, v, p, S,?). Here r represents 
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either msg (message) or prm (promise) in M(x), v is the value, p is the process 
that added the message/promise, S is a pointer set of processes whose local view 
(on x) agrees with the to timestamp of the message/promise. If the type € I’, the 
fifth component (7?) is the process id that has reserved the time slot right-adjacent 
to the message/promise. ? is a wildcard that may (or not) be matched. 

Simple Words. A simple word € X*#(X U T), and each HW, is a word 
ce (X*#(X U T))*. # is a special symbol not in XU T, which separates the 
last symbol from the rest of the simple word. Consecutive symbols of X in a 
simple word in HW, represent adjacent messages/promises in M(x) and are 
hence unavailable for a RMW. # does not correspond to any element from the 
memory, and is used to demarcate the last symbol of the simple word. 


(msg, 0, q, {}) (msg, 2, p, {p, 9})#(msg, 3, r, {r, s}) #(msg, 22, b, {d}) #(msg, 12, w, {w, b}) 


# = 
tot t 
1 


t i t | t 


t 
2 3 4 5 6 7 8 9 
Fig. 3: A higher order word HW (black) with four embedded simple words (pink). 


Higher order words. A higher order word is a sequence of simple words. Figure 
depicts a higher order word with four simple words. We use a left to right 
order in both simple words and higher order words. Furthermore, we extend 
in the straightforward manner the classical word indexation strategy to higher 
order words. For example, the symbol at the third position of the higher order 
word HW in Figure [3]is HW [3] = (msg, 2, p, {p, q}). A higher order word HW is 
well-formed iff for every p € P, there is a unique position 7 in HW having p in its 
pointer set; that is, HW|[i] is of the form (—,—,—,5,?) €E XUT s.t. p € S. The 
higher order word given in Figure [3] is well-formed. We will use ptr(p, HW) to 
denote the unique position i in HW having p in its pointer set. We assume that 
all the manipulated higher order words are well-formed. 


Toes promises/messages 
| reservations HW, 
M (y) = U4, - = U6; -) Ga #(, U4, - -) #(. U6) = - -) 
M(x) |(.v4, JG 3, (4%, -) (., 02; -)(- 5 -) (.v4,-)0,03, 5 J#(4¥1,5-) (4 ¥,--)#( 0s, - -) 
> HW, 


Timestamp 


Fig. 4: Map from memories M(x), M (y) to higher order words HWz, HW. 


Each higher order word HW, represents the entire space [0, o0) of available 
timestamps in M(a). Each simple word in HW, represents a timestamp interval 
(f, t], while consecutive simple words represent disjoint timestamp intervals (while 
preserving order). The memory types constituting each simple word take up 
adjacent timestamp intervals, spanning the timestamp interval of the simple word. 
The adjacency of timestamp intervals within simple words is used in RMW steps 
and reservations. The last symbol in a simple word denotes a message/promise 
which, (1) if in X, is available for a RMW, while (2) if in I’, is unavailable for 
RMW since it is followed by a reservation. Symbols at positions other than 
the rightmost in a simple word, represent messages/promises which are not 
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available for RMW. Figure [4] presents a mapping from a memory of PS 2.0-rlx to 
a collection of higher order words (one per location) in LoHoW. 


Initializing higher order words. For each location x € Loc, the initial higher 


order word Hwint is defined as eC sae where P is the set of all processes 
and pı is some process in P. The set of all higher order words HW'™"* for all 
locations x represents the initial memory of PS 2.0-rlx where all locations have 
value 0, and all processes are aware of the initial message. 


Simulating PS 2.0 Memory Operations in LoHoW. In the following, we 
describe how to handle PS 2.0-rlx instructions in LoHoW. Since we only have 
the rlx mode, we denote Reads, Writes and RMWs as wt(a,v), rd(a,v) and 
U(x, Ur, Uw), dropping the modes. 

Reads. To simulate a rd(x,v) by a process p in LoHoW, we need an index 
j > ptr(p, HW,,) in HW, such that HW.,[j] is a memory type with value v of the 
form (—,v,—, 5’, ?) (2 denotes that the type is either from X or I’). The read is 
simulated by adding p to the set S’ and removing it from its previous set. 


ptr(p, HW,,) j2 ptr(p, HW,,) ptr(p, HW,,) =j 
(aaas?) ae (0,8,2) eee ee (4455 Nph?) Ra (0,-,5'U {p}, 2) 
HW, with p € S new HW, 


Fig. 5: Transformation of HW, on a read. (? denotes that type is from X or I’) 


Writes. A wt(x,v) by a process p (writing v to x) is simulated by adding a new 
msg type in HW, with a timestamp higher than the view of p for x: (1) add the 
simple word (msg, v, p, {p}) to the right of ptr(p, HW.,) or (2) there is a € X such 
that the word wa is in HW, to the right of ptr(p, HWz). Modify w#a to get 
wa#(msg, v, p, {p})-. Remove p from its previous pointer set. 


old = ptr(p, HW,,) izn) old ptr(p, HW.) > old 
n wt(z,v 
ejay uy) a (1) kaaa S NN (ph D “ne #(msg, v, p, {p}) ie 
HW, with pe S new HW, 
——— —— wt(z,v) 
á (a =] =3 5, ?) ote wHa (2) S l4 ml aS \ {p}, 2) --wa# (msg, U, P, {p} Ri 
old = ptr(p, HW,,) j > old old ptr(p, HW,,) := j > old 


Fig. 6: Transformation of HW, on a write. (? denotes that type is from X or I’). 


RMWs. Capturing RMWsSs is similar to the execution of a read followed by 
a write. In PS 2.0-rlx, a process p performing an RMW, reads from a mes- 
sage with a timestamp interval (,¢] and adds a message to M(x) with times- 
tamp interval (t,—]. Capturing RMWs needs higher order words. Consider a 
U(x, Up, Uy) step by process p. Then, there is a simple word ---#(™%— 8) in 
HW, having (—, vr, —, S) as the last memory type whose position is to the right 
of ptr(p, HW,,). As usual, p is removed from its pointer set, #(—,v,,—,S) is 
replaced with (—, vr, —, S\{p})# and (—,vw,p, {p}) is appended, resulting in 


extending E Ur, — 9) to <- (—, Ur, — S\{p})#(—, vw, p, {p}) 


Promises, Reservations and Cancellations. Handling promises made by a process p 
in PS 2.0-rlx is similar to handling wt(x, v): we add the simple word #(Prm, v, p, {}) 
in HW, to the right of the position ptr(p, HW), or append (prm, v, p,{}) at the 
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end of a simple word with a position larger than ptr(p, HW,). The memory type 
has tag prm (a promise), and the pointer set is empty (since making a promise 
does not lift the view of the promising process). Splitting the time interval of a 
promise is simulated in LoHoW by inserting a new memory type right before the 
corresponding promise memory type (prm, —, p, S), while fulfilment of a promise 
by a process p results in replacing (prm, v, p, S) with (msg, v, p, S U {p}). 

In PS 2.0-rlx, a process p makes a reservation by adding the pair (x, (f, t]) 
to the memory, given that there is a message/promise in the memory with 
timestamp interval (—, f]. In LoHoW this is captured by “tagging” the rightmost 
memory type (message/promise) in a simple word with the name of the process 
that makes the reservation. This requires us to consider the memory types from 
T = {msg, prm} x Val x P x 2P x P where the last component stores the process 
which made the reservation. Such a memory type always appears at the end of a 
simple word, and represents that the next timestamp interval adjacent to it has 
been reserved. Observe that nothing can be added to the right of a memory type 
of the form (msg, v, p, S,q). Thus, reservations are handled as follows. 


(Res) Assume the rightmost symbol in a simple word as (msg, v, p, S). To capture 
the reservation by q, (msg, v, p, S) is replaced with (msg, v, p, S, q). 

(Can) A cancellation is done by removing the last component q from 
(msg, v, p, S, q) resulting in (msg, v, p, S). 


Certification In PS 2.0-rlx, certification for a process p happens from the capped 
memory, where intermediate time slots (other than reserved ones) are blocked, 
and any new message can be added only at the maximal timestamp. This is 
handled in LoHoW by one of the following: (1) Addition of new memory types is 
allowed only at the right end of any HW,, or (2) If the rightmost memory type 
in HW, is of form (—,v,—,—,q) with q Æ p (a reservation by q), then the word 
#(msg, v,q,{}) is appended at end of HW,. 

Memory is altered in PS 2.0-rlx during certification phase to check for promise 
fulfilment, and at the end of the certification phase, we resume from the memory 
which was there before. To capture this in LoHoW, we work on a duplicate 
of (HW,,)eeLoc in the certification phase. Notice that the duplication allows 
losing non-deterministically, empty memory types: these are memory types whose 
pointer set is empty, as well as redundant simple words, which are simple words 
consisting entirely of empty memory types. This copy of HW, is then modified 
during certification, and is discarded once we finish the certification phase. 


5.1 Formal Model of LoHoW 


In the following, we formally define LoHoW and state the equivalence of the 
reachability problem in PS 2.0-rlx and LoHoW. For a memory type m = (r, v, p, S) 
(or m = (r,v, p, S,q)), we use m.value to denote v. For a memory type (r, v, p, S, ?) 
and a process p' € P, we define the following: add(m, p’) = (r,v,p,S U {p’}, ?) 
and del(m, p’) = (r,v,p, S \ {p’}, 2). This corresponds to the addition/deletion of 
the process p’ to/from the set of pointers of m. Extending the above notation, 


14 P. A. Abdulla et al. 


given a higher order word HW, a position i € {1,...,|HW|}, and p E€ P , we 
define the following: add(HW, p, i) = HW[1,7— 1] - add(HW/#], p) - HW[2+ 1, |HW]], 
add(HW, p, i) = HW[1, i—1]-add(HW/[Z], p)- HW[i+1, |HW|], and mov(HW, p, i) = 
add(del(HW, p), p, i). This corresponds to the addition/deletion/relocation of the 
pointer p to/from the word HWfi]. 

Insertion into higher order words. A higher order word HW can be extended 
in position 1 < 7 < |HW| with a memory type m = (r, v, p, {p}) as follows: 


e Insertion as a new simple word is defined only if HW[j — 1] = # (i.e., the 
position j is the end of a simple word). Let HW’ = del(HW, p) (i.e., removing p 
from its previous set of pointers). Then, the insertion of m results in 


HW & m = HW'[L, Jj] #(r, v, p, {p}) -HW'[j + 1, IHW] 
j S e 
new simple word 
e Insertion at the end of a simple word is defined only if HW[|j — 1] = # and 
HW|j] € X (i.e., the last memory type in the simple word should be free from 
reservations). Let HW’ = del(HW, p). For HW! = w,-#m’- we, and |w,-#m’| = j 
the insertion of m results in 
HW & m = w: m Š #(r, V, P, {p}) “w2 
j SE 
m extends m’ 


e Splitting a promise is defined only if m’ = HW{j] has form (prm, —, p, —, ?) (i.e., 
the memory type at position j is a promise). Let HW’ = del(HW, p). Then, 


HW'[1, j — 2] - (r, v, p, {pP}) : #m' -HW'[j +1, |HW]] if HW'[j — 1] = # 
— M 


HW BEE A m splits m’ 
i) HW'[1, i earn me AWD +1, |HWI] EHW = 1 4 # 
— m 


m splits m’ 
Observe that in both cases we insert the new type m just before position j. 


e Fulfilment of a promise is defined only if m’ = HW[j] is of the form (prm, v, p, S) 
or (prm, v, p, S, q). Let HW’ = del(HW, p). Then, the extended higher order 


HW {È m = HW'[L, j — 1] - (msg, v, p, S U {p}, 2) HW'[j + 1, |HW'|] 
j ae SS 


m’ is fulfilled by p 


where ? is q if m’ = (prm, v, p, S,q) € T and is omitted if m’ = (prm, v, p, S) E€ X. 


Making/Canceling a reservation. A higher order word HW can also be 
modified by p by making/cancelling a reservation at a position 1 < j < |HW]|. We 
define the operation Make(HW, p, j) (Cancel(HW, p, j)) that reserves (cancels) 
a time slot at j. Make(HW,p,j) (resp. Cancel(HW,p,7)) is only defined if 
HW [j] is of the form (r,v,q,S) (resp. (r,v,q,5,p)) and HW[j — 1] = #. Then, 
we have Make(HW,p,j) = HW[1, j — 1] - (r,v,¢,5,p) - HW[y + 1, |HW|] and 
Cancel(HW, p, j) = HW([1,7 — 1] - (r, v, q, S) - HW[g + 1, |HWI|]. 
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Process configuration in LoHoW. A configuration of p € P in LoHoW consists 
of a pair (o, HW) where (1) ø is the process state maintaining the instruction 
label and the register values (see Subsection |3), and HW is a mapping from the 


set of locations to higher order words. The transition relations $ and <5 


. . . . eye . cert 
between process configurations are given in Figure/y the transition relation ——> 
p 


is used only in the certification phase while 5*4, is used to simulate the standard 


p 
phase of PS 2.0-rlx. A read operation in both phases (standard and certification) 
is handled by reading a value from a memory type which is on the right of the 
current pointer of p. A write operation, in the standard phase, can result in the 
insertion, on the right of the current pointer of p, of a new memory type at the 
end of a simple word or as a new simple word. The memory type resulting from 
a write in the certification phase is only allowed to be inserted at the end of the 
higher order word or at the reserved slots (using the rule splitting a reservation). 
Write can also be used to fulfil a promise or to split a promise (i.e., partial 
fulfilment) during the both phases. Making/canceling a reservation will result 
in tagging/untagging a memory type at the end of a simple word on the right 
of the current pointer of p. The case of RMW is similar to a read followed by a 
write operations (whose resulting memory type should be inserted to the right of 
the read memory type). Finally, a promise can only be made during the standard 
phase and the resulting memory type will be inserted at the end of a simple word 
or as a new word on the right of the current pointer of p. 


ra(z,v) 


o ———> 0’, i> ptr(p, HW(z)), v = HW (x)[i].value, 
p 


HW’ = HW [|r => mov(HW(z), p, i)] Read 
(o, HW) 5 (o', HW’) a € {cert, std} 
p 


wt(x,v) 1 


o ——> 0’, i> ptr(p, HW(z)), 
p 
n 
HW’ = HW [|z => (HW (zx) T (msg, v, p, {p}))] E 


(o, HW) = (o', HW’) Poe ANR 
p 
i > ptr(p, HW (x)), HW’ = HW [x => Make(HW (zx), p, i)] Making a reservation 


std 


(o, HW) —> (o, HW’) 
p 
fe > ptr(p, HW (x)), vr = HW(z)[i].value, 
p 
E 
HW’ = HW [|x > (HW (z) og (msg, wr, p, {p}))] Standardeapdate 


std 


(o, HW) —> (o', HW’) 
i> ptr(p, HW (2)), HW’! = HWie > (HW(z) 5 (prm, v, p, {}))] 


=F] Promise 

(o, HW) ©; (o, HW’) 
id 

o SER, o, i, = max(ptr(p, HW (2)), ptr(g, HW (2))), 
p 
HW’ = HW [zx œ mov(HW (x), p, ix)|ectoclv ++ mov(HW (x), g, ix)|eetoc SC-fence 

(o, HW) 5 (o’, HW’) a € {std, cert} 

p 


Fig. 7: A subset of LoHoW inference rules at the process level. 


Losses in LoHoW. Let HW and HW’ be two higher order words in (X*#(X U 
T))+. Let us assume that HW = wup#aruz#a...uęķ#ap and HW = 
U1 F£b1 VoFfb2 ...UmFfbm, with u; v; E€ X* and a;,b; E XUT. We extend the 
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subword relation C to higher order word as follows: HW C HW’ iff there is a 
strictly increasing function f : {1,...,k} — {1,... m} s-t. (1) u; E vfa) for all 
1<i<k, (2) aj = by), and (3) we have the same number of memory types 
of the form (prm,—,—,—) or (prm, —, —, —, —) in HW and HW’. The relation E 
corresponds to the loss of some special empty memory types and redundant 
simple words (as explained earlier). The relation E is extended to mapping from 
locations to higher order words as follows: HW C HW’ iff HW (x) E HW’ (x) 
for all x € Loc. 


LoHoW states. A LoHoW state st is a tuple ((J, R), HW) where J: PH L 
maps each process p to the label of the next instruction to be executed, R : 
Reg — Val maps each register to its current value, and HW is a mapping from 
locations to higher order words. The initial LoHoW state stinit is defined as 
((Jinit, Rinit), HWinit) where: (1) Jinit(p) is the label of the initial instruction of p; 
(2) Rinit ($r) = 0 for every $r € Reg; and (3) HWinit(x) = HW'2"* for all x € Loc. 

For two LoHoW states st = ((J, R), HW) and st’ = ((J’,R’), HW’) and 
a € {std,cert}, we write st “> st’ iff one of the following cases holds: (1) 


((J(p), R), HW) > ((J'(p),R’), HW’) and J(p') = J'(p') for all p' # p, or (2) 
(J,R) = (J', R’) and HW C HW’. 


Two phases LoHoW states. A two-phases state of LoHoW is S = 
(7, P, Ststa, Stcert) where m € {cert, std} is a flag describing whether the LoHoW 
is in “standard” phase or “certification” phase, p is the process which evolves 
in one of these phases, while Ststa, Stcert are two LoHoW states (one for each 
phase). When the LoHoW is in the standard phase, then ststq evolves, and when 
the LoHoW is in certification phase, stger_ evolves. A two-phases LoHoW state 
is said to be initial if it is of the form (std, p, Stinit, Stinit), Where p € P is any 
process. The transition relation — between two-phases LoHoW states is defined 
as follows: Given S = (7, p, Ststa, Steer) and S’ = (T, p’, 5th. 4, Sthert), We have 
S — S’ iff one of the following cases holds: 


— During the standard phase. 7 = 7’ = std, p = p’, Steers = Stier, and 
Ststa = st..g- This corresponds to simulating a standard step of process p. 
p 


— During the certification phase. 7 = 7’ = cert, p = p', Ststa = 5th,4 and 


cert 1 


Steert — Stoer_- This simulates a certification step of process p. 
p 


— From the standard phase to the certification phase. 7 = std, 7’ = 
cert, p = p’, Ststa = Stha = ((J, R), HW), and st)... is of the form 
((J, R), HW’) where for every x € Loc, HW'(x) = HW(x)#(msg, v, q, {}) if 
HW (zx) is of the form w: #(—,v,-—,—,q) with q 4 p, and HW’ (x) = HW(z) 
otherwise. This corresponds to the copying of the standard LoHoW state to 
the certification LoHoW state in order to check if the set of promises made by 
the process p can be fulfilled. This transition rule can be implemented by a 
sequence of transitions which copies one symbol at a time, from HW to HW’. 
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— From the certification phase to standard phase. 7 = cert, 7’ = std, 
Ststa = Stora, Steert = Sthert; aNd Stcert is of the form ((J, R), HW) with HW (z) 
does not contain any memory type of form (prm, —, p, —, ?) for all x € Loc (i.e., 
all promises made by p are fulfilled). 


The Reachability Problem in LoHoW. Given an instruction label function 
J :P— L that maps each p € P to a label in Lp, the reachability problem 
in LoHoW asks whether there exists a two phases LoHoW state S of the form 
(std, —, ((J, R), HW), ((J’, R’), HW’)) s.t. (1) HW (x) and HW’'(z) do not con- 
tain any memory type of the form (prm, —, p, —, ?) for all x € Loc, and (2) S is 
reachable in LoHoW (i.e., So []* S’ where Sg is an initial two-phases LoHoW 
states). A positive answer to this problem means J is reachable in Prog in LoHoW. 
The following theorem states the equivalence between LoHoW and PS 2.0-rlx in 
terms of reachable instruction label functions. 


Theorem 2. An instruction label function J is reachable in a program Prog in 
LoHoW iff J is reachable in Prog in PS 2.0-rlx. 


5.2 Decidability of LoHoW with Bounded Promises 


The equivalence of the reachability in LoHoW and PS 2.0-rlx, coupled with Theo- 
rem [I] shows that reachability is undecidable in LoHoW. To recover decidability, 
we look at LoHoW with only bounded number of the promise memory type in 
any higher order word. Let K-LoHoW denote LoHoW with a number of promises 
bounded by K. (Observe that K-LoHoW corresponds to bdPS 2.0-rlx.) 


Theorem 3. The reachability problem is decidable for K-LoHoW. 


As a corollary of Theorem the decidability of reachability follows for 
bdPS 2.0-rlx. The proof makes use of the framework of Well-Structured Transi- 
tion Systems (WSTS) [7J13]. Next, we state that the reachability problem for 
K-LoHoW (even for K = 0) is highly non-trivial (i.e., non-primitive recursive). 
The proof is done by reduction from the reachability problem for lossy channel 
systems, in a similar to the case of TSO [8] where we insert SC-fence instructions 
everywhere in the process that simulates the lossy channel process (in order to 
ensure that no promises can be made by that process). 


Theorem 4. The reachability problem for K-LoHoW is non-primitive recursive. 


6 Source to Source Translation 


In this section, we propose an algorithmic approach for state reachability in 
concurrent programs under PS 2.0. We first recall the notion of view altering 
reads [I], and that of bounded contexts in SC [29]. 


View Altering Reads. A read from the memory is view altering if it changes the 
view of the process reading it. This means that the view in the message being 
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read from was greater than the process view on some variable. The message 
which is read from in turn is called a view altering message. A run in which the 
total number of view altering reads (across all threads) is bounded (by some 
parameter) is called a view-bounded run. The underapproximate analysis for 
PS 2.0-ra without promises and reservations [1| considered view bounded runs. 
Essential Events. An essential event in a run p of a program under PS 2.0 is 
either a promise, a reservation or a view altering read by some process in the run. 
Bounded Context. A context is an uninterrupted sequence of actions by a single 
process. In a run having K contexts, the execution switches from one process 
to another K — 1 times. A K bounded context run is one where the number of 
context switches are bounded by K € N. The K bounded context reachability 
problem in SC checks for the existence of a K bounded context run reaching 
some chosen instruction. Now we define the notion of bounding for PS 2.0. 


The Bounded Consistent Reachability Problem. A run p of a concurrent 
program under PS 2.0, MSo or MS, Lor MS 2 Lr ...[—]* MS, 
is called K bounded iff the number of essential events in pis < K. The K 
bounded reachability problem for PS 2.0 checks for the existence of a run p 
of Prog which is K-bounded. Assuming Prog has n processes, we propose an 
algorithm that reduces the K bounded reachability problem to a K +n bounded 
context reachability problem of a program [Prog] under SC. 

Translation Overview. We now provide a brief overview of the data structures 
and procedures utilized in our translation; the full details and correctness are in 
[5]. Let Prog be a concurrent program under PS 2.0 with set of processes P and 
locations Loc. Our algorithm relies on a source to source translation of Prog to a 
bounded context SC program [Prog], as shown in Figure [8] and operates on the 
same data domain (need not be finite). The translation (i) adds a new process 
(MAIN) that initializes the global variables of [Prog], (2) for each process p € P 
adds local variables, which are initialized by the function INITPROC. 


[Prog] = ((global vars); (MAIN); ([proc p reg $r*i*])* 
[proc p reg $r* i*]:= proc p reg $r* 
(local vars) (INITPROC) (CSO)? 0 ([i]}?)* 
[A : a]? = A: (CSI); [s]?; (CSO)? 
[if exp then i* else i*]? := if exp then ([i]”)* else([i]?)* 
[while exp do i*]? := while ezp do ([i]?)* 
assume(exp)]]” := assume(exp) 


[$r = exp]? := $r = exp 


z = $r]? ‘= see write Pseudocode 
o€{rlx,ra} 

$r = x]? = see read Pseudocode 
o€{rlx,ra} 


Fig. 8: Source-to-source translation map 


This is followed by the code block (CSO)?° (Context Switch Out) that 
optionally enables the process to switch out of context. For each A labeled 
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instruction 7 in p, the map [A : 7]? transforms it into a sequence of instructions 
as follows : the code block (CSI) (Context Switch In) checks if the process is 
active in the current context; then it transforms each statement s of instruction 
i into a sequence of instructions following the map [s]?, and finally executes 
the code block (CSO). (CSO)? facilitates two things: when the process is 
at an instruction label A, (1) allows p to make promises/reservations after A, 
s.t. the control is back at A after certification; (2) it ensures that the machine 
state is consistent when p switches out of context. Translation of assume, if 
and while statements keep the same statement. Translation of read and write 
statements are described later. Translation of RMW statements are omitted for 
ease of presentation. 


The set of promises a process makes has to be constrained with respect to the 
set of promises that it can certify To address this, in the translation, processes 
run in two modes : a ‘normal’ mode and a ‘check’ (consistency check) mode. In 
the normal mode, a process does not make any promises or reservations. In the 
check mode, the process may make promises and reservations and subsequently 
certify them before switching out of context. In any context, a process first enters 
the normal mode, and then, before exiting the context it enters the check mode. 
The check mode is used by the process to (1) make new promises/reservations and 
(2) certify consistency of the machine state. We also add an optional parameter, 
called certification depth (certDepth), which constrains the number of steps a 
process may take in the check mode to certify its promises. Figure [9] shows the 
structure of a translated run under SC. 


one, context, 


E "a T senPj—1 se@nPi—1 3 
F Cs : cso CSO ese) pj n 
ini HA O m pi Cc d pj-1 N |} —_», pj-1 CC |} —__5 
(init ; R R ; a 2 ASSERT (false) 


Fig. 9: Control flow: In each context, a process runs first in normal mode n and 
then in consistency check mode cc. The transitions between these modes is 
facilitated by the CSO code block of the respective process. We check assertion 
failures for K + n context-bounded executions (j < K +n). 


To reduce the PS 2.0 run into a bounded context SC run, we use the bound 
on the number of essential events. From the run p in PS 2.0, we construct a K 
bounded run p’ in PS 2.0 where the processes run in the order of generation of 
essential events. So, the process which generates the first essential event is run 
first, till that event happens, then the second process which generates the second 
essential event is run, and so on. This continues till K + n contexts : the K 
bounds the number of essential events, and the n is to ensure all processes are run 
to completion. The bound on the number of essential events gives a bound on the 
number of timestamps that need to be maintained. As observed in [I], each view 
altering read requires two timestamps; additionally, each promise/reservation 
requires one timestamp. Since we have K such essential events, 2K time stamps 
suffice. We choose Time = {0,1,2,...,2A} as the set of timestamps. Now we 
briefly give a high level overview of the translation. 
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Data Structures. The message data structure represents a message generated 
as a write or a promise and has 4 fields (i) var, the address of the memory 
location written to; (ii) the timestamp t in the view associated with the message; 
(iii) v, the value written; and (iv) flag, that keeps track of whether it is a message 
or a promise; and, in case of a promise, which process it belongs to. The View 
data structure stores, for each memory location x, (i) a timestamp t € Time, (ii) 
a value v written to x, (iii) a Boolean | € {true, false} representing whether 
t is an exact timestamp (which can be used for essential events) or an abstract 
timestamp (which corresponds to non-essential events). 


Global Variables. The Memory is an array of size K holding elements of type 
message . This array is populated with the view altering messages, promises and 
reservations generated by the program. We maintain counters for (1) the number 
of elements in Memory ; (2) the number of context switches that have occurred; 
and (3) the number of essential events that have occurred. 


Local Variables. In addition to its local registers, each process has local variables 
including (1) a local variable view which stores a local instance of the view 
function (this is of type View), (2) a flag denoting whether the process is running 
in the current context, and (3) a flag checkMode denoting whether the process 
is in the certification phase. We implement the certification phase as a function 
call, and hence store the process state and return address, while entering it. 


6.1 Translation Maps 


In what follows we illustrate how the translation simulates a run under PS 2.0. 
At the outset, recall that each process alternates, in its execution, between two 
modes: a normal mode (n in Figure [9) at the beginning of each context and the 
check mode at the end of the current context (cc in Figure p), where it may 
make new promises and certify them before switching out of context. 


Context Switch Out (CSO). We describe the CSO module; Algorithm 1 
of Figure [10| provides its pseudocode. CSOP” is placed after each instruction À 
in the original program and serves as an entry and exit point for the consistency 
check phase of the process. When in normal mode (n) after some instruction A, 
CSO non-deterministically guesses whether the process should exit the context 
at this point, and sets the checkMode flag to true and subsequently, saves its 
local state and the return address (to mark where to resume execution from, in 
the next context). The process then continues its execution in the consistency 
check mode (cc) from the current instruction label (A) itself. Now the process 
may generate new promises (see Algorithm 1 of Figure [10p and certify these as 
well as earlier made promises. In order to conclude the check mode phase, the 
process will enter the CSO block at some different instruction label A’. Now 
since the checkMode flag is true, the process enters the else branch, verifies that 
there are no outstanding promises of p to be certified. Since the promises are 
not yet fulfilled, when p switches out of context, it has to mark all its promises 
uncertified. When the context is back to p again, this will be used to fulfil the 
promises or to certify them again before the context switches out of p again. 
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Then it exits the check mode phase, setting checkMode to false. Finally it loads 
the saved state, and returns to the instruction label A (where it entered check 
mode) and exits the context. Another process may now resume execution. 


Algorithm 1: CSO 


/* nondeterministically enter 
check mode and exit context */ 

if nondet() then 

if scheckMode then 

/* enter consistency check 
*/ 

if not in context then 

| enter context 

end 

checkMode + true 

save localstate 

returnAddr + À 

else 

/* consistency check 
successful! */ 

ensure all promises certified 


/* for next context */ 

mark all promises 
uncertified 

checkMode + false 


Algorithm 2: Write 


update localstate with write 
if nondet() then 
/* (i) no fresh timestamp */ 
if checkMode then 
/* write is not a promise */ 
certify message with reservation 
or splitting 
else if nondet() then /* (ii) new ts */ 
generate a view; generate a message 
if checkMode then 
insert message into Memory as 
Promise and certify 
else 
insert message into Memory as 
concrete message 


end 
else /* (iii) fulfill old promise */ 
get Promise from Memory 
check variable, value and view match 
if checkMode then 

| mark message as certified 


load localstate else 
goto returnAddr | mark message as fulfilled 
exit context end 

end replace message into Memory 


end end 


Fig. 10: Algorithms for CSO and Write 


Write Statements. The translation of a write instruction |x := $r],, where 
o € {rlx,ra} of a process p is given in Algorithm 2 of Figure [10] This is the 
general pseudo code for both kinds of memory accesses, with specific details 
pertaining to the particular access mode omitted. Let us first consider execution 
in the normal mode (i.e., checkMode is false). First, the process updates its local 
state with the value that it will write. Then, the process non-deterministically 
chooses one of three possibilities for the write, it either (i) does not assign a fresh 
timestamp (non-essential event), (ii) assigns a fresh timestamp and adds it to 
memory, or (iii) fulfils some outstanding promise. 


Let us now consider a write executing when checkMode is true, and highlight 
differences with the normal mode. In case (i), non essential events exclude 
promises and reservations. Then, while in certification phase, since we use a 
capped memory, the process can make a write if either (1) the write interval can 
be generated through splitting insertion or (2) the write can be certified with 
the help of a reservation. Basically the writes we make either split an existing 
interval (and add this to the left of a promise), or forms a part of a reservation. 
Thus, the time stamp of a neighbour is used. In case (ii) when a fresh time stamp 
is used, the write is made as a promise, and then certified before switching out of 
context. The analogue of case (iii) is the certification of promises for the current 
context; promise fulfilment happens only in the normal mode. To help a process 
decide the value of a promise, we use the fact that CBMC allows us to assign a 
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non-deterministic value of a variable. On top of that, we have implemented an 
optimization that checks the set of possible values to be written in the future. 
Read Statements. The translation of a read instruction [$r := æo, o € 
{rlx, ra} of process p is given in Algorithm 3 of Figure [11] 


l The process first pie whether it Algorithm 3: Read 
will read from a view altering message in 


the memory of from its local view. If it 
is the latter, the process must first verify 
whether it can read from the local view ; 


if nondet() then /* local read 
*/ 

check local state is valid 

update local state with 


for instance, reading from the local view read 

may not be possible after execution of a else /* nonlocal 
s 5 x (view-switching) read */ 

fence instruction when the timestamp check if local state Allows 

of a variable x gets incremented from read and get message 

the local view t to t > t. In the case of from 

a view altering read, we first check that oo new 

we have not reached the context switch- satisfies conditions for read 

ing/essential event bound. Then the new 

message is fetched from Memory and we 

check the view (timestamps) in the ac- 

quired message satisfy the conditions Fig. 11: Algorithm for Read 

imposed by the access type € {ra, rlx}. Finally, the process updates its view 

with that of the new message and increments the counters for the context switches 

and the essential events. Theorem [5] proves the correctness of our translation. 


update local state 
end 


Theorem 5. Given a program Prog under PS 2.0, and K € N, the source to 
source translation constructs a program [prog] whose size is polynomial in Prog 
and K such that, there is a K-bounded run of Prog under PS 2.0 reaching a set 
of instruction labels, if and only if there is a K+n-bounded context run of [prog] 
under SC that reaches the same set of instruction labels. 


7 Implementation and Experimental Results 


In order to check the efficiency of the source-to-source translation, we implement 
a prototype tool, PS2SC which is the first tool to handle PS 2.0. PS2SC takes as 
input a C program and a bound K and translates it to a program Prog’ to be run 
under SC. We use CBMC v5.10 as the backend verifier for Prog’. CBMC takes 
as input L, the loop unrolling parameter for bounded model checking of Prog’. If 
PS2SC returns unsafe, then the program has an unsafe execution. Conversely, if 
it returns safe then none of the executions within the subset violate any assertion. 
K may be iteratively incremented to increase the number of executions explored. 
PS2SC has a functionality of partial-promises allowing subsets of processes to 
promise, providing an effective under-approximation technique. 

We now report the results of experiments we have performed with PS2SC. We 
have two objectives: (1) studying the performance of PS2SC on thin-air litmus 
tests and benchmarks utilizing promises, and (2) comparing PS2SC with other 
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model checkers when operating in the promise-free mode. In the first case we 
show that PS2SC is able to uncover bugs in litmus tests and examples with few 
reads and writes to the shared memory. When this interaction and subsequent 
non-determinism of PS 2.0 increases, we also enable partial promises. For the 
second case we compare PS2SC with three model checkers CDSCHECKER [25], 
GENMC and Rcmc that support the promise-free subset of PS 2.0. 
Our observations highlight the ability to detect hard to find bugs with small 
K for unsafe benchmarks. We do not consider compilation time for any tool 
while reporting the results. For PS2SC, the time reported is the time taken by 
the CBMC backend for analysis. The timeout used is 1hr for all benchmarks. 
All experiments are conducted on a machine with 3.00 GHz Intel Core i5-3330 
CPU and 8GB RAM running an Ubuntu-16 64-bit operating system. We denote 
timeout by ‘TO’, and memory limit exceeded by ‘MLE’. 


Benchmarks Utilizing Promises. In the following, testcase K PS25C 
we report the performance of PS2SC on litmus tests ARM_weak 4 0.76538 
and parametrized tests. Upd-Stuck 4 1.252s 

split 4 25.737s 
Litmus Tests. We test PS2SC on litmus-tests adapted me : re 
from |16)22)11)23]. These examples are small programs CYC 5 1.967s 
that serve as barebones thin-air tests for the C11 mem- Comey’ 2 ae 
ory model. Consistency tests based on the Java Memory Pugh3 3 12.920s 
Model are proposed in [23], which were experimented Pugh8 3 1.67s 
on by [27] with their MRDer tool. Like MRDer, PS2SC pty 5 380s 
is able to verify most of these tests within 1 minute Pughl3 5 3.345s 


which shows its ability to handle typical programming Table 1: Litmus Tests 
idioms of PS 2.0 (see Table [1}. 


Parameterized Tests. In Table |2| we consider testcase K  PS25C 
unsafe examples adapted from the Fibonacci- fiblocal3 4 07425 
based benchmarks of SV-COMP 2019 [I0]. In fiblocal4 4 0.761s 
these examples a process is required to generate fib_local_cas.3 4 1.1328 
a promise (speculative write) with value as the fiblocal_casA 4 1.147s 
it fibonacci number. This promise is certified testcase K PS2SC[Ip| 
using process-local reads. Thus though the pa- fib_global.2 4 55.972s 
rameter 7 increases the interaction of the promis- fib_global3 4 2m4s 
ing process with the memory remains constant. fib-global4 4 4m20s 


exp-global_l 4 19m37s 


The CAS variant requires the process to make 
exp-_global_2 4 41m12s 


use of reservations. We note that PS2SC uncov- 
ers the bugs effectively in these cases. In cases Table 2: Above: testcases with 
where promise-certificate requires reads from ex- local reads, Below: global 
ternal processes, the amount of shared-memory reads 

interaction increases with 7. In this case, we use partial promises. 

How to recover tractable analysis? We note that though the above example 
consists of several processes interacting with the memory, the bug can be un- 
covered even if only a single process is allowed to make promising writes. We 
run PS2SC in the partial-promises mode. We considered the case where only a 
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single process generates promises, and PS2SC was able to uncover the bug. The 
results obtained are in Table[2| where PS2SC[1p] denotes that only one process is 
permitted to perform promises. We then repeat our experiments on other unsafe 
benchmarks - including ExponentialBug from Fig. 2 of [15] - and have similar 
observations. To summarize, we note that the huge non-determinism of PS 2.0 
can be fought by using the modular approach of partial-promises. 


Comparing with Other Tools. In this section, we compare performance of 
PS2SC in promise-free mode with CDSCHECKER [25], GENMC [18] and Romc 
[I7] (which do not support promises). The main objective of this section is to 
provide evidence for the practicability of the essential-event-bounding technique. 
The results of this section indicate that the source-to-source translation with K- 
essential-event bounding is effective at uncovering hard to find bugs in non-trivial 
programs. Additionally, we observe that in most examples considered, we had 
K < 10. We provide here a subset of the experimental results and the remaining 
in the full version of the paper [5]. In the tables that follow we provide the value 
of K (for PS2SC) and the value of L (loop-unrolling bound) for all tools. 


Parameterized Bench- benchmark L K PS2SC CDSC GenMC ROMG 


marks. In Table exponential_25_unsafe 25 10 3.5328 7.2398 3.7368 TO 
we experiment on exponential_50_unsafe 50 10 6.128s 36.361s 39.920s TO 
, fibonacci_3_unsafe 3 20 9.392s 46m8s 0.462s  0.544s 
two parametrized fibonacci_4_unsafe 4 20 34.019s TO 12.437s 18.953s 


benchmarks: 
ExponentialBug 
(Fig. 2 of [I5]) and Fibonacci (from SV-COMP 2019). In ExponentialBug(JV) 
N is the number writes made to a variable by a process. We note that in 
ExponentialBug(N) the number of executions grows as N!, while the processes 
have to follow a specific interleaving to uncover the hard to find bug. In 
Fibonacci(N), two processes compute the value of the nt” fibonacci number in 
a distributed fashion. 


Table 3: Parameterized benchmarks 


Concurrent data struc- benchmark L K PS2SC CDSC GenMC RCMC 


tures based benchmarks. —hehner2-unsafe 4 5 7.2078 0.0338 0.0948 0.0878 
In Table [4] we consider hehner3_unsafe 4 5 28.345s 0.036s 2m53s 1m13s 

mark n linuxlocks2_unsafe 2 4 0.547s 0.032s 0.073s 0.078s 
benc s based o linuxlocks3_unsafe 2 4 1.031s 0.031s 0.083s 0.081s 
concurrent data struc- 


tures. The first of these Table 4: Concurrent data structures 


is a concurrent locking algorithm originating from [14]. The second, 
LinuxLocks(N) is adapted from evaluations of CDSCHECKER [25]. We note 
that if not completely fenced, it is unsafe. We fence all but one lock access. Both 
these results show the ability of our tool to uncover bugs with a small value of K. 


Variations of mutual exclusion protocols. We consider variants of mutual exclu- 
sion protocols from SV-COMP 2019. The fully fenced versions of the protocols 
are safe. We modify these protocols by introducing bugs and comparing the 
performance of PS2SC for bug detection with the other tools. These benchmarks 
are parameterized by the number of processes. In Table |5} we unfence a single 
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process of the Peterson and Szymanski protocols making them unsafe. These 
are benchmarks petersonU(i) and szymanskiU(i) where i is the number of 
processes. 

In petersonB(i), we “benchmark L K PS25C GDSChecker GenMC ROMG 


keep all processes fenced ~petersonU(4) I 6 1.408s 0.039s TO 9.129s 
: s petersonU(8) 1 6 47.786s TO TO TO 
but introduce a bug mato szymanskiU(4) 1 2 1.015s 0.043s MLE TO 
the critical section of a — szymanskiU(8) 1 2 6.176s TO TO TO 
process (write a value toa ~petersonB(3) 1 2 0.4875 0.0538 0.0838 0.0875 
: petersonB(5) 1 2 2.713s TO TO TO 
shared variable and read petersonB(7) 1 2 11.008s TO TO TO 


a different value from it). 
We note that the other 
tools do not scale, while 
PS2SC is able to detect the bug within one minute, showing that essential 
event-bounding is an effective under-approximation technique for bug-finding. 


Table 5: Mutual exclusion benchmarks with a single 
unfenced process 


Remark. Through all these experiments, we observe that SMC tools and our tool 
try to tackle the same problem by using orthogonal approaches to finding bugs. 
Hence, through the experiments above we are not trying to pitch one approach 
against the other, but rather trying to highlight the differences in their features. 
We have exhibited examples where our tool is able to uncover hard-to-find bugs 
faster than the others with relatively small values of K. 


8 Related Work and Conclusion 


Most of the existing verification work for C/C++ concurrency models concern 
the development of stateless model checking coupled with dynamic partial order 
reduction (e.g., [6[17]18]26]25]) and do not handle the promising semantics. 
Context-bounding has been proposed in for programs running under SC. This 
work has been extended in different directions and has led to efficient and scalable 
techniques for the analysis of concurrent programs (see e.g., [24]21]33[32/12[34]). 
In the context of weak memory models, context-bounded analyses have been 
proposed for TSO/PSO [BI] and POWER [3]. 

The decidability of the verification problems for programs running under 
weak memory models has been addressed for TSO [8], RA [I], SRA [19], and 
POWER [2]. We believe that our proof techniques can be easily adapted to work 
with different variants of the promising semantics [16] (see [4]). For instance, in 
the code-to-code translation, the mechanism for making and certifying promises 
and reservations is isolated in one module, which can be easily changed to cover 
different variants of the promising semantics. Furthermore, the undecidability 
proof still goes through for . Moreover, providing a tool for the verification 
of (among other things) litmus tests, will provide a valuable environment which 
can be used in further improvements of the promising semantics. To the best of 
our knowledge, this the first time that this problem is investigated for PS 2.0-rlx 
and PS2SC is the first tool for automated verification of programs under PS 2.0. 
Finally, studying the decidability problem for related models that solve the 
thin-air problem (e.g., Paviotti et al. [27]) is interesting and kept as future work. 
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Abstract. Asynchronous message-passing systems are employed fre- 
quently to implement distributed mechanisms, protocols, and processes. 
This paper addresses the problem of precise data flow analysis for such 
systems. To obtain good precision, data flow analysis needs to some- 
how skip execution paths that read more messages than the number 
of messages sent so far in the path, as such paths are infeasible at run 
time. Existing data flow analysis techniques do elide a subset of such 
infeasible paths, but have the restriction that they admit only finite 
abstract analysis domains. In this paper we propose a generalization of 
these approaches to admit infinite abstract analysis domains, as such 
domains are commonly used in practice to obtain high precision. We have 
implemented our approach, and have analyzed its performance on a set 
of 14 benchmarks. On these benchmarks our tool obtains significantly 
higher precision compared to a baseline approach that does not elide any 
infeasible paths and to another baseline that elides infeasible paths but 
admits only finite abstract domains. 


Keywords: Data Flow Analysis - Message-passing systems. 


1 Introduction 


Distributed software that communicates by asynchronous message passing is a very 
important software paradigm in today’s world. It is employed in varied domains, 
such as distributed protocols and workflows, event-driven systems, and UI-based 
systems. Popular languages used in this domain include Go (https://golang.org/), 
Akka (https://akka.io/), and P (https://github.com/p-org). 

Analysis and verification of asynchronous systems is an important problem, 
and poses a rich set of challenges. The research community has focused historically 
on a variety of approaches to tackle this overall problem, such as model checking 
and systematic concurrency testing [25,13], formal verification to check properties 
such as reachability or coverability of states [41,3,2,21,18,31,19,1], and data flow 
analysis [29]. 

Data flow analysis [32,30] is a specific type of verification technique that 
propagates values from an abstract domain while accounting for all paths in a 
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program. It can hence be used to check whether a property or assertion always 
holds. The existing verification and data flow analysis approaches mentioned 
earlier have a major limitation, which is that they admit only finite abstract 
domains. This, in general, limits the classes of properties that can be successfully 
verified. On the other hand, data flow analysis of sequential programs using infinite 
abstract domains, e.g., constant propagation [32], interval analysis [12], and 
octagons [44], is a well developed area, and is routinely employed in verification 
settings. In this paper we seek to bridge this fundamental gap, and develop a 
precise data flow analysis framework for message-passing asynchronous systems 
that admits infinite abstract domains. 


1.1 Motivating Example: Leader election 


1: max := process number; send (1, maz) 

2: Process is in active mode 

3: while true do 

4: if process is in passive mode then 

5: receive a mesg and send this same mesg 
6: else if message (1, i) arrives then 

7 if i 4 maz then 

8 Send message (2, i); left := i 


else 
10: Declare maz as the global maximum 
11: nr_leaders++; assert(nr_ leaders = 1) 
12: else if message (2,7) arrives then 
13: if left > 7 and left > max then 
14: maz := left 
15: Send message (1, maz) 
16: else 
17: Process enters passive mode 


Fig. 1. Pseudo-code of each process in leader election, and a partial run 


To motivate our work we use a benchmark program® in the Promela lan- 
guage [25] that implements a leader election protocol [17]. In the protocol there 
is a ring of processes, and each process has a unique number. The objective is 
to discover the “leader”, which is the process with the maximum number. The 
pseudo-code of each process in the protocol is shown in the left side of Figure 1. 
Each process has its own copy of local variables maz and left, whereas nr_ leaders 
is a global variable that is common to all the processes (its initial value is zero). 
Each process sends messages to the next process in the ring via an unbounded 
FIFO channel. Each process becomes “ready” whenever a message is available 
for it to receive, and at any step of the protocol any one ready process (chosen 


3 file assertion. leader.prm in www.imm.dtu.dk/~albl/promela-models.zip. 
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non-deterministically) executes one iteration of its “while” loop. (We formalize 
these execution rules in a more general fashion in Section 2.1.) The messages 
are a 2-tuple (x,7), where x can be 1 or 2, and 1 < i < maz. The right side of 
Figure 1 shows a snapshot at an intermediate point during a run of the protocol. 
Each dashed arrow between two nodes represents a send of a message and a 
(completed) receipt of the same message. The block arrow depicts the channel 
from Process 2 to Process 1, which happens to contain three sent (but still 
unreceived) messages. 

It is notable that in any run of the protocol, Lines 10-11 happen to get 
executed only by the actual leader process, and that too, exactly once. Hence, 
the assertion never fails. The argument for this claim is not straightforward, and 
we refer the reader to the paper [17] for the details. 


1.2 Challenges in property checking 


Data flow analysis could be used to verify the assertion in the example above, e.g., 
using the Constant Propagation (CP) abstract domain. This analysis determines 
at each program point whether each variable has a fixed value, and if yes, the 
value itself, across all runs that reach the point. In the example in Figure 1, all 
actual runs of the system that happen to reach Line 10 come there with value 
zero for the global variable nr_ leaders. 

A challenge for data flow analysis on message-passing systems is that there 
may exist infeasible paths in the system. These are paths with more receives of a 
certain message than the number of copies of this message that have been sent so 
far. For instance, consider the path that consists of two back-to-back iterations 
of the “while” loop by the leader process, both times through Lines 3,6,9-11. This 
path is not feasible, due to the impossibility of having two copies of the message 
(1, max) in the input channel [17]. The second iteration would bring the value 1 
for nr_ leaders at Line 10, thus inferring a non-constant value and hence declaring 
the assertion as failing (which would be a false positive). 

Hence, it is imperative in the interest of precision for any data flow analysis 
or verification approach to track the channel contents as part of the exploration 
of the state space. Tracking the contents of unbounded channels precisely is 
known to be undecidable even when solving problems such as reachability and 
coverability (which are simpler than data flow analysis). Hence, existing ap- 
proaches either bound the channels (which in general causes unsoundness), or use 
sound abstractions such as unordered channels (also known as the Petri Net or 
VASS abstraction) or lossy channels. Such abstractions suffice to elide a subset 
of infeasible paths. In our running example, the unordered channel abstraction 
happens to suffice to elide infeasible paths that could contribute to a false positive 
at the point of the assertion. However, the analysis would need to use an abstract 
domain such as CP to track the values of integer variables. This is an infinite 
domain (due to the infinite number of integers). The most closely related previous 
dataflow analysis approach for distributed systems [29] does use the unordered 
channel abstraction, but does not admit infinite abstract domains, and hence 
cannot verify assertions such as the one in the example above. 
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1.3 Our Contributions 


This paper is the first one to the best of our knowledge to propose an approach 
for data flow analysis for asynchronous message-passing systems that (a) admits 
infinite abstract domains, (b) uses a reasonably precise channel abstraction among 
the ones known in the literature (namely, the unordered channels abstraction), 
and (c) computes maximally precise results possible under the selected channel 
abstraction. Every other approach we are aware of exhibits a strict subset of the 
three attributes listed above. It is notable that previous approaches do tackle the 
infinite state space induced by the unbounded channel contents. However, they 
either do not reason about variable values at all, or only allow variables that are 
based on finite domains. 

Our primary contribution is an approach that we call Backward DFAS. This 
approach is maximally precise, and admits a class of infinite abstract domains. 
This class includes well-known examples such as Linear Constant Propagation 
(LCP) [51] and Affine Relationships Analysis (ARA) [46], but does not include 
the full (CP) analysis. We also propose another approach, which we call Forward 
DFAS, which admits a broader class of abstract domains, but is not guaranteed 
to be maximally precise on all programs. 

We describe a prototype implementation of both our approaches. On a set of 
14 real benchmarks, which are small but involve many complex idioms and paths, 
our tool verifies approximately 50% more assertions than our implementation of 
the baseline approach [29]. 

The rest of the paper is structured as follows. Section 2 covers the background 
and notation that will be assumed throughout the paper. We present the Backward 
DFAS approach in Section 3, and the Forward DFAS approach in Section 4. 
Section 5 discusses our implementation and evaluation. Section 6 discusses related 
work, and Section 7 concludes the paper. 


2 Background and Terminology 


Vector addition systems with states or VASS [27] are a popular modelling tech- 
nique for distributed systems. We begin this section by defining an extension to 
VASS, which we call a VASS-Control Flow Graph or VCFG. 


Definition 1. A VASS-Control Flow Graph or VCFG G is a graph, and is 
described by the tuple (Q, ô, r, qo, V, 7,9), where 

Q is a finite set of nodes, 8 C Q x Q is a finite set of edges, 

r EN, qo is the start node, V is a set of variables or memory locations, 

T :— A maps each edge to an action, where A = ((V > Z) > (V > Z)), 
6:5— Z" maps each edge to a vector in Z". 


For any edge e = (q1, 42) € ô, if m(e) = a and O(e) = w, then a is called 
the action of e and w is called the queuing vector of e. This edge is depicted as 


qı 27, q2. The variables and the actions are the only additional features of a 
VCFG over VASS. 
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A configuration of a VCFG is a tuple (q,c,€), where q € Q, c € N” and 
€€(V > Z). The initial configuration of a VCFG is (qo, 0, ĉo), where 0 denotes 
a vector with r zeroes, and éo is a given initial valuation for the variables. The 
VCFG can be said to have r counters. The vector c in each configuration can 
be thought of as a valuation to the counters. The transitions between VCFG 
configurations are according to the rule below: 


e=(q,q), e€ ô, mle) =a, Oe) =w, alı) = £2, a tw=e, 2 > 0 
(q1, c1, £1) >e (q2, C2; E2) 


2.1 Modeling of Asynchronous Message Passing Systems as VCFGs 


Asynchronous systems are composed of finite number of independently executing 
processes that communicate with each other by passing messages along FIFO 
channels. The processes may have local variables, and there may exist shared (or 
global) variables as well. For simplicity of presentation we assume all variables 
are global. 


(a) Co! m2 C1? my 
C2? m2 cy!m, 


x:=x+1 


(a) 


Fig. 2. (a) Asynchronous system with two processes, (b) its VCFG model 


Figure 2(a) shows a simple asynchronous system with two processes. In this 
system there are two channels, cı and c2, and a message alphabet consisting of 
two elements, mı and mg. The semantics we assume for message-passing systems 
is the same as what is used by the tool Spin [25]. A configuration of the system 
consists of the current control states of all the processes, the contents of all the 
channels, and the values of all the variables. A single transition of the system 
consists of a transition of one of the processes from its current control-state to a 
successor control state, accompanied with the corresponding queuing operation 
or variable-update action. A transition labeled c!m can be taken unconditionally, 
and results in ‘m’ being appended to the tail of the channel ‘c’. A transition 
labeled c? m can be taken only if an instance of ‘m’ is available at the head 
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of ‘c’, and results in this instance getting removed from ‘c’. (Note, based on 
the context, we over-load the term “message” to mean either an element of the 
message alphabet, or an instance of a message-alphabet element in a channel at 
run-time.) 

Asynchronous systems can be modeled as VCFGs, and our approach performs 
data flow analysis on VCFGs. We now illustrate how an asynchronous system 
can be modeled as a VCFG. We assume a fixed number of processes in the 
system. We do this illustration using the example VCFG in Figure 2(b), which 
models the system in Figure 2(a). Each node of the VCFG represents a tuple 
of control-states of the processes, while each edge corresponds to a transition 
of the system. The action of a VCFG edge is identical to the action that labels 
the corresponding process transition. (“id” in Figure 2(b) represents the identity 
action) The VCFG will have as many counters as the number of unique pairs 
(cimj) such that the operation c,!m, is performed by any process. If an edge 
e in the VCFG corresponds to a send transition c,!m, of the system, then e’s 
queuing vector would have a +1 for the counter corresponding to (ci, mj) and 
a zero for all the other counters. Analogously, a receive operation gets modeled 
as -1 in the queuing vector. In Figure 2(b), the first counter is for (c1,m1) while 
the second counter is for (cz,m2). Note that the +1 and -1 encoding (which 
are inherited from VASS’s) effectively cause FIFO channels to be treated as 
unordered channels. 

When each process can invoke procedures as part of its execution, such 
systems can be modeled using inter-procedural VCFGs, or iVCFGs. These are 
extensions of VCFGs just as standard inter-procedural control-flow graphs are 
extensions of control-flow graphs. Constructing an iVCFG for a given system 
is straightforward, under a restriction that at most one of the processes in the 
system can be executing a procedure other than its main procedure at any time. 
This restriction is also present in other related work [29,5]. 


2.2 Data flow analysis over iVCFGs 


Data flow analysis is based on a given complete lattice £L, which serves as the 
abstract domain. As a pre-requisite step before we can perform our data flow 
analysis on iVCFGs, we first consider each edge v 27, w in each procedure 
in the iVCFG, and replace the (concrete) action a with an abstract action 
f, where f : L —> CL is a given abstract transfer function that conservatively 
over-approximates [12] the behavior of the concrete action a. 

Let p be a path in aiVCFG, let po be the first node in the path, and let £é; be 
a valuation to the variables at the beginning of p. The path p is said to be feasible 
if, starting from the configuration (po, 0, ;), the configuration (q, d, €) obtained at 
each successive point in the path is such that d > 0, with successive configurations 
along the path being generated as per the rule for transitions among VCFG 
configurations that was given before Section 2.1. For any path p = e1 e2... €k 
of an iVCFG, we define its path transfer function ptf(p) as fe, © fe,_,---° fer; 
where fe is the abstract action associated with edge e. 
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y= z:=0 


Fig. 3. Example iVCFG 


The standard data flow analysis problem for sequential programs is to compute 
the join-over-all-paths (JOP) solution. Our problem statement is to compute the 
join-over-all-feasible-paths (JOFP) solution for iVCFGs. Formally stated, if start 
is the entry node of the “main” procedure of the iVCFG, given any node target 
in any procedure of the iVCFG, and an “entry” value dọ € £ at start such that 
do conservatively over-approximates £o, we wish to compute the JOFP value at 
target as defined by the following expression: 


(ptf (p))(do) 


p is a feasible and interprocedurally valid 
path in the iVCFG from start to target 


Intuitively, due to the unordered channel abstraction, every run of the system 
corresponds to a feasible path in the iVCFG, but not vice versa. Hence, the 
JOFP solution above is guaranteed to conservatively over-approximate the JOP 
solution on the runs of the system (which is not computable in general). 


3 Backward DFAS Approach 


In this section we present our key contribution — the Backward DFAS (Data Flow 
Analysis of Asynchronous Systems) algorithm — an interprocedural algorithm 
that computes the precise JOFP at any given node of the iVCFG. 

We begin by presenting a running example, which is the iVCFG with two 
procedures depicted in Figure 3. There is only one channel and one message in the 
message alphabet in this example, and hence the queuing vectors associated with 
the edges are of size 1. The edges without the vectors are implicitly associated 
with zero vectors. The actions associated with edges are represented in the form 
of assignment statements. The edges without assignment statements next to 
them have identity actions. The upper part of the Figure 3, consisting of nodes 
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a,b, p,q, h,i, j, k,l, is the VCFG of the “main” procedure. The remaining nodes 
constitute the VCFG of the (tail) recursive procedure foo. The solid edges are 
intra-procedural edges, while dashed edges are inter-procedural edges. 

Throughout this section we use Linear Constant Propagation (LCP) [51] as 
our example data flow analysis. LCP, like CP, aims to identify the variables that 
have constant values at any given location in the system. LCP is based on the 
same infinite domain as CP; i.e., each abstract domain element is a mapping from 
variables to (integer) values. The “3” relation for the LCP lattice is also defined 
in the same way as for CP. The encoding of the transfer functions in LCP is as 
follows. Each edge (resp. path) maps the outgoing value of each variable to either 
a constant, or to a linear expression in the incoming value of at most one variable 
into the edge (resp. path), or to a special symbol T that indicates an unknown 
outgoing value. For instance, for the edge g > m in Figure 3, its transfer function 
can be represented symbolically as (t’=t,x’=x+1,y’=y,z’=z), where the primed 
versions represent outgoing values and unprimed versions represent incoming 
values. 

Say we wish to compute the JOFP at node k. The only feasible paths that 
reach node k are the ones that attain calling-depth of three or more in the 
procedure foo, and hence encounter at least three send operations, which are 
required to clear the three receive operations encountered from node h to node k. 
All such paths happen to bring the constant values (t = 1, z = 1) to the node k. 
Hence, (t = 1, z = 1) is the precise JOFP result at node k. However, infeasible 
paths, if not elided, can introduce imprecision. For instance, the path that directly 
goes from node c to node o in the outermost call to the Procedure foo (this path 
is of calling-depth zero) brings values of zero for all four variables, and would 
hence prevent the precise fact (t = 1, z = 1) from being inferred. 


3.1 Assumptions and Definitions 


The set of all £ — CL transfer functions clearly forms a complete lattice based on 
the following ordering: fı 3 fo iff for all d € L, fi(d) 2 fo(d). Backward DFAS 


makes a few assumptions on this lattice of transfer functions. The first is that 
this lattice be of finite height; i.e., all strictly ascending chains of elements in 
this lattice are finite (although no a priori bound on the sizes of these chains is 
required). The second is that a representation of transfer functions is available, 
as are operators to compose, join, and compare transfer functions. Note, the two 
assumptions above are also made by the classical “functional” inter-procedural 
approach of Sharir and Pnueli [55]. Thirdly, we need distributivity, as defined 
below: for any fi, fo, f E L > L, (fillfe)of = (fiof)U(foof). The distributivity 
assumption is required only if the given system contains recursive procedure calls. 

Linear Constant Propagation (LCP) [51] and Affine Relationships Analysis 
(ARA) [46] are well-known examples of analyses based on infinite abstract domains 
that satisfy all of the assumptions listed above. Note that the CP transfer- 
functions lattice is not of finite height. Despite the LCP abstract domain being 
the same as the CP abstract domain, the encoding chosen for LCP transfer 
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functions (which was mentioned above), ensures that LCP uses a strict, finite- 
height subset of the full CP transfer-functions lattice that is closed under join and 
function composition operations. The trade-off is that LCP transfer functions for 
assignment statements whose RHS is not a linear expression and for conditionals 
are less precise than the corresponding CP transfer functions. 

Our final assumption is that procedures other than “main” may send messages, 
but should not have any “receive” operations. Previous approaches that have 
addressed data flow analysis or verification problems for asynchronous systems 
with recursive procedures also have the same restriction [54,29,19]. 

We now introduce important terminology. The demand of a given path p in 
the VCFG is a vector of size r, and is defined as follows: 


. p f,w 
demand(p) = maz(0 — w, 0), if p = (v => z2) 7 
max (demand (p') — w,0), if p = (e.p'), where e = (v == z) 


Intuitively, the demand of a path p is the minimum required vector of counter 
values in any starting configuration at the entry of the path for there to exist a 
sequence of transitions among configurations that manages to traverse the entire 
path (following the rule given before Section 2.1). It is easy to see that a path p 
is feasible iff demand(p) = 0. 

A set of paths C is said to cover a path p iff: (a) all paths in C have the 
same start and end nodes (respectively) as p, (b) for each p’ € C, demand(p') < 
demand(p), and (c) (Upecptf(p')) 3 ptf (p). (Regarding (b), any binary vector 
operation in this paper is defined as applying the same operation on every pair 
of corresponding entries, i.e., point-wise.) 

A path template (p1,p2,.--,Pn) of any procedure F; is a sequence of paths 
in the VCFG of F; such that: (a) path pı begins at the entry node enp, of F; 
and path pn ends at return node exp, of F;, (b) for all pj, 1 < i < n, pi ends at 
a call-site node, and (c) for all p;,1 < i < n, p; begins at a return-site node v’, 
such that v!. corresponds to the call-site node v’~! at which p;_; ends. 


3.2 Properties of Demand and Covering 


At a high level, Backward DFAS works by growing paths in the backward direction 
by a single edge at a time starting from the target node (node k in our example 
in Figure 3). Every time this process results in a path reaching the start node 
(node a in our example), and the path is feasible, the approach simply transfers 
the entry value do via this path to the target node. The main challenge is that due 
to the presence of cycles and recursion, there are an infinite number of feasible 
paths in general. In this subsection we present a set of lemmas that embody our 
intuition on how a finite subset of the set of all paths can be enumerated such 
that the join of the values brought by these paths is equal to the JOFP. We then 
present our complete approach in Section 3.3. 

Demand Coverage Lemma: Let pọ and ph be two paths from a node v; to 
a node vj such that demand(p,) < demand(p2). If pı is any path ending at vi, 
then demand(p;.p,) < demand (pı.p2). 
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This lemma can be argued using induction on the length of path pı. A 
similar observation has been used to solve coverability of lossy channels and 
well-structured transition systems in general [3,18,2]. An important corollary 
of this lemma is that for any two paths p} and po from v; to v; such that 
demand (p,) < demand(p2), if there exists a path pı ending at v; such that p1.p2 
is feasible, then p,.p} is also feasible. 

Function Coverage Lemma: Let pz be a path from a node vi to a node 
vj, and Pz be a set of paths from v; to vj such that (Lpser, ptf (ps)) I ptf (p2). 
Let pı be any path ending at vi and p3 be any path beginning at vj. Under 
the distributivity assumption stated in Section 3.1, the following property holds: 
(Lipper, Ptf(P1-P2-p3)) S ptf (p1-p2-ps). 

The following result follows from the Demand and Function Coverage Lemmas 
and from monotonicity of the transfer functions: 


Corollary 1: Let pọ be a path from a node v; to a node vj, and P> be a set 
of paths from v; to vj such that Pz covers pz. Let pı be any path ending at vi. 
Then, the set of paths {p1.ph | py € P2} covers the path p,.p2. 

We now use the running example from Figure 3 to illustrate how we leverage 
Corollary 1 in our approach. When we grow paths in backward direction from 
the target node k, two candidate paths that would get enumerated (among 
others) are p; = hijk and p; = hijkhijk (in that order). Now, p; covers pj. 
Therefore, by Corollary 1, any backward extension p;.p; of p; (pı is any path 
prefix) is guaranteed to be covered by the analogous backward extension p1.p; 
of pi. By definition of covering, it follows that p;.p; brings in a data value that 
conservatively over-approximates the value brought in by p;.p;. Therefore, our 
approach discards p; as soon as it gets enumerated. To summarize, our approach 
discards any path as soon as it is enumerated if it is covered by some subset of 
the previously enumerated and retained paths. 


Due to the finite height of the transfer functions lattice, and because demand 
vectors cannot contain negative values, at some point in the algorithm every 
new path that can be generated by backward extension at that point would 
be discarded immediately. At this point the approach would terminate, and 
soundness would be guaranteed by definition of covering. 

In the inter-procedural setting the situation is more complex. We first present 
two lemmas that set the stage. The lemmas both crucially make use of the 
assumption that recursive procedures are not allowed to have “receive” operations. 
For any path pa that contains no receive operations, and for any demand vector 
d, we first define supply(pa,d) as min(s,d), where s is the sum of the queuing 
vectors of the edges of pa. 

Supply Limit Lemma: Let pi,p2 be two paths from vi to v; such that 
there are no receive operations in pı and pz. Let p be any path beginning at vj. 
If demand(pp)) = d, and if supply(pi,d) > supply(po, d), then demand(p1.pp) < 
demand (p2.p»). 

A set of paths P is said to d-supply-cover a path pa iff: (a) all paths 
in P have the same start node and same end node (respectively) as pa, (b) 
(Upepptf(p')) 2 ptf (pa), and (c) for each p' € P, supply(p',d) > supply(pa, d). 
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Supply Coverage Lemma: If pq.py is a path, and demand(pp)) = d, and 
if a set of paths P d-supply-covers pa, and pq as well as all paths in P have no 
receive operations, then the set of paths {p'.py|p’ E€ P} covers the path pa-pp. 

Proof argument: Since P d-supply-covers pa, by the Supply Limit Lemma, 
we have (a): for all p' € P, demand(p'.pp) < demand(pa.py). Since P d-supply- 
covers Pa, we also have (U,epptf(p’)) 3 ptf (pa). From this, we use the Function 
Coverage lemma to infer that (b): (Upc pptf(p’-py)) 3 ptf (pa-pp). The result now 
follows from (a) and (b). 

Consider path hijk in our example, which gets enumerated and retained (as 
discussed earlier). This path gets extended back as ghijk; let us denote this path 
as p’. Let d be the demand of p’ (i.e., is equal to 3). Our plan now is to extend 
this path in the backward direction all the way up to node p, by prepending 
interprocedurally valid and complete (i.e., IVC) paths of procedure foo in front 
of p'. An IVC path is one that begins at the entry node of foo, ends at the 
return node of foo, is of arbitrary calling depth, has balanced calls and returns, 
and has no pending returns when it completes [50]. First, we enumerate the IVC 
path(s) with calling-depth zero (i.e., path co in the example), and prepend them 
in front of p’. We then produce deeper IVC paths, in phases. In each phase i, 
i > 0, we inline IVC paths of calling-depth 7 — 1 that have been enumerated and 
retained so far into the path templates of the procedure to generate IVC paths of 
calling-depth i, and prepend these IVC paths in front of p’. We terminate when 
each IVC path that is generated in a particular phase j is d-supply-covered by 
some subset P of IVC paths generated in previous phases. 


The soundness of discarding the IVC paths of phase j follows from the Supply 
Coverage lemma (p’ would take the place of p, in the lemma’s statement, while 
the path generated in phase j would take the place of pa in the lemma statement). 
The termination condition is guaranteed to be reached eventually, because: (a) the 
supplies of all IVC paths generated are limited to d, and (b) the lattice of transfer 
functions is of finite height. Intuitively, we could devise a sound termination 
condition even though deeper and deeper IVC paths can increment counters 
more and more, because a deeper IVC path that increments the counters beyond 
the demand of p’ does not really result in lower overall demand when prepended 
before p’ than a shallower IVC path that also happens to meet the demand of p’ 
(Supply Limit lemma formalizes this). 

In our running example, for the path ghijk, whose demand is equal to three, 
prefix generation for it happens to terminate in the fifth phase. The IVC paths 
that get generated in the five phases are, respectively, po = co, pı = cdefgmcono, 
p2 = (cdefym)?co(no)?, p3 = (cdefgm)*co(no)*, pa = (edefgm)*co(no)*, and 
ps = (cdefgm)°co(no)°. supply(p3,3) = supply(pa,3) = supply(ps,3) = 3. The 
LCP transfer functions of the paths are as follows. ptf(p3) is (t?=1, x°=x+3, 
y’=x+2, z’=1), ptf (pa) is (t=1, x’ =x+4, y’=x+3, z’=1), while ptf (ps) is (t’=1, 
x’=x+5, y’ =x+4, 2’=1). {p3, p4} 3-supply-covers ps. 

We also need a result that when the IVC paths in the jth phase are d-supply- 
covered by paths generated in preceding phases, then the IVC paths that would 
be generated in the (j + 1)th would also be d-supply-covered by paths generated 
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Algorithm 1 Backward DFAS algorithm 


1: procedure COoMPUTEJOFP (target) 
> Returns JOFP from start € Nodes to target € Nodes, entry value do € £. 


2: for all v € Nodes do > Nodes is the set of all nodes in the VCFG 

3: sPaths(v) = 0 

A: For each intra-proc VCFG edge v— target, add this edge to workList and to 

sPaths(v) 

5: repeat 

6: Remove any path p from workList. 

T: Let vı be the start node of p. 

8: if vı is a return-site node, with incoming return edge from func. F; then 

9: Let v3 be the call-site node corresponding to v1, e1 be the call-site-to- 
entry edge from v3 to enp, and rı be the exit-to-return-site edge from 
exp, to vi. 

10: for all p; € COMPUTEENDTOEND(F}, demand(p)) do 

11: p2 = €1-p1-T1-p 

12: if COVERED(pz, sPaths(v3)) returns false then 

13: Add pg to sPaths(v3) and to workList. 

14: else if vı is the entry node of a func. Fı then 

15: for all v3 € call-sites (F1) do 

16: Let e1 be the call edge from v3 to v1. 

17: p2 = €1.p. 

18: if COVERED(pz, sPaths(v3)) returns false then 

19: Add pə to sPaths(v3) and to workList. 

20: else i 

21: for all intra-procedural edges e = v3 am, vı in the VCFG do 

22: if COVERED(e.p, sPaths(v3)) returns false then 

23: Add the path (e.p) to sPaths(v3) and to workList. 


24: until workList is empty 
25: P = {p | p€ sPaths(start), demand(p) = 0} 
26: return Ler (ptf (p)) (do) 


in phases that preceded j. This can be shown using a variant of the Supply 
Coverage Lemma, which we omit in the interest of space. Once this is shown, it 
then follows inductively that none of the phases after phase j are required, which 
would imply that it would be safe to terminate. 

The arguments presented above were in a restricted setting, namely, that 
there is only one call in each procedure, and that only recursive calls are allowed. 
These restrictions were assumed only for simplicity, and are not actually assumed 
in the algorithm to be presented below. 


3.3 Data Flow Analysis Algorithm 


Our approach is summarized in Algorithm 1. COMPUTEJOFP is the main routine. 
The algorithm works on a given iVCFG (which is an implicit parameter to the 
algorithm), and is given a target node at which the JOFP is to be computed. 
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Algorithm 2 Routines invoked for inter-procedural processing in Backward 
DFAS algorithm 


1: procedure COMPUTEENDTOEND(F, d) 
5 Returns a set of paths that d-supply-covers each IVC path of the procedure 


F. 
for all F; € Funcs do 


2: 

3: Place all 0-depth paths from F; in sIVCPaths(F;, d) 

4: repeat 

5: pathsAdded = false 

6: for all path template (p1, p2,...,pn) in any function F; € Funcs do 

he Let F be the procedure called from the call-site at which pı ends, F> be 

the procedure called from the call-site at which p2 ends, and so on. 

8: for all pi € sIVCPaths(Fi,d), pp € sI1VCPaths(F2,d),... do 

9: Let p’ = p1 .€1.P1-T1.P2-€2.P2-T2.... Dn, where each e; is the call-edge 
that leaves the call-site node at which p; ends and r; is the return 
edge corresponding to e;. 

10: if DSCoverep(p’, d, s!VCPaths(F;,d)) returns false then 

11: Add the path p’ to s!VCPaths(F;,d). pathsAdded = true. 


12: until pathsAdded is false 
13: return s!VCPaths(F, d) 


A key data structure in the algorithm is sPaths; for any node v, sPaths(v) is 
the set of all paths that start from v and end at target that the algorithm has 
generated and retained so far. The workList at any point stores a subset of the 
paths in sPaths, and these are the paths of the iVCFG that need to be extended 
backward. 


To begin with, all edges incident onto target are generated and added to the 
sets sPaths and workList (Line 4 in Algorithm 1). In each step the algorithm 
picks up a path p from workList (Line 6), and extends this path in the backward 
direction. The backward extension has three cases based on the start node of 
the path p. The simplest case is the intra-procedural case, wherein the path is 
extended backwards in all possible ways by a single edge (Lines 21-23). The 
routine COVERED, whose definition is not shown in the algorithm, checks if its 
first argument (a path) is covered by its second argument (a set of paths). Note, 
covered paths are not retained. 


When the start node of p is the entry node of a procedure F; (Lines 14-19), the 
path is extended backwards via all possible call-site-to-entry edges for procedure 
Fi. 

If the starting node of path p is a return-site node vı (Lines 8-13) in a calling 
procedure, we invoke a routine COMPUTEENDTOEND (in line 10 of Algorithm 1). 
This routine, which we explain later, returns a set IVC paths of the called 
procedure such that every IVC path of the called procedure is d-supply-covered 
by some subset of paths in the returned set, where d denotes demand(p). These 
returned IVC paths are prepended before p (Line 11), with the call-edge e; and 
return edge rı appropriately inserted. 
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The final result returned by the algorithm (see Lines 25 and 26 in Algorithm 1) 
is the join of the values transferred by the zero-demand paths (i.e., feasible paths) 
starting from the given entry value dọ € £. 


Routine COMPUTEENDTOEND: This routine is specified in Algorithm 2, and 
is basically a generalization of the approach that we described in Section 3.2, 
now handling multiple call-sites in each procedure, mutual recursion, calls to 
non-recursive procedures, etc. We do assume for simplicity of presentation that 
there are no cycles (i.e., loops) in the procedures, as this results in a fixed number 
of path templates in each procedure. There is no loss of generality here because 
we allow recursion. The routine incrementally populates a group of sets — there 
is a set named sIVCPaths(F;,d) for each procedure F; in the system. The idea 
is that when the routine completes, s!VCPaths(F;, d) will contain a set of IVC 
paths of F; that d-supply-cover all IVC paths of F;. Note that we simultaneously 
populate covering sets for all the procedures in the system in order to handle 
mutual recursion. 

The routine COMPUTEENDTOEND first enumerates and saves all zero-depth 
paths in all procedures (see Line 3 in Algorithm 2). The routine then iteratively 
takes a path template at a time, and fills in the “holes” between corresponding 
(call-site, return-site) pairs of the form vi~1,v! in the path template with IVC 
paths of the procedure that is called from this pair of nodes, thus generating 
a deeper IVC path (see the loop in lines 6-11). A newly generated IVC path 
p’ is retained only if it is not d-supply-covered by other IVC paths already 
generated for the current procedure F; (Lines 10-11). The routine terminates 
when no more IVC paths that can be retained are generated, and returns the set 
sIVCPaths(F, d). 


3.4 Illustration 


We now illustrate our approach using the example in Figure 3. Algorithm 1 would 
start from the target node k, and would grow paths one edge at a time. After four 
steps the path hijk would be added to sPaths(h) (the intermediate steps would 
add suffixes of this path to sPaths(i), sPaths(j), and sPaths(k)). Next, path khijk 
would be generated and discarded, because it is covered by the “root” path k. 
Hence, further iterations of the cycle are avoided. On the other hand, the path 
hijk would get extended back to node q, resulting in path ghijk being retained 
in sPaths(q). This path would trigger a call to routine COMPUTEENDTOEND. 
As discussed in Section 3.2, this routine would return the following set of paths: 
po = co, and p; = (cdefgm)*co(no)' for each 1 < i < 4. (Recall, as discussed in 
Section 3.2, that (cdefgm)°co(no)* and deeper IVC paths are 3-supply-covered 
by the paths {p3, p4}-) 

Each of the paths returned above by the routine COMPUTEENDTOEND 
would be prepended in front of ghijk, with the corresponding call and return 
edges inserted appropriately. These paths would then be extended back to node a. 
Hence, the final set of paths in sPaths(a) would be abpcoghijk, abpcdefgmconoghijk, 
abp(cdefgm)? co(no)?, abp(cdefgm)* co(no)?, and abp(cdefgm)*co(no)*. Of these 
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paths, the first two are ignored, as they are not feasible. The initial data-flow 
value (in which all variables are non-constant) is sent via the remaining three 
paths. In all these three paths the final values of variables ‘t’ and ‘z’ are one. 
Hence, these two constants are inferred at node k. 


3.5 Properties of the algorithm 


We provide argument sketches here about the key properties of Backward DFAS. 
Detailed proofs are available in the appendix that accompanies this paper [4]. 

Termination. The argument is by contradiction. For the algorithm to not 
terminate, one of the following two scenarios must happen. The first is that 
an infinite sequence of paths gets added to some set sPaths(v). By Higman’s 
lemma it follows that embedded within this infinite sequence there is an infinite 
sequence p1, p2,..., such that for all i, demand(p;) < demand(p;41). Because the 
algorithm never adds covered paths, it follows that for all i: | ]i<p<i+1 ptf (pr) 2 
Ll<,<; Ptf (pr). However, this contradicts the assumption that the lattice of 
transfer functions is of finite height. The second scenario is that an infinite 
sequence of IVC paths gets added to some set s!VCPaths(F, d) for some procedure 
F and some demand vector d in some call to routine COMPUTEENDTOEND. 
Because the “supply” values of the IVC paths are bounded by d, it follows that 
embedded within the infinite sequence just mentioned there must exist an infinite 
sequence of paths pı, p2,..., such that for all i, supply(p;,d) > supply(pi4i, d). 
However, since d-supply-covered paths are never added, it follows that for all i: 
Lhen<iza PY (Pr) 3 Lhepe; ptf (pr). However, this contradicts the assumption 
that the lattice of transfer functions is of finite height. 

Soundness and Precision. We already argued informally in Section 3.2 that 
the algorithm explores all feasible paths in the system, omitting only paths that 
are covered by other already-retained paths. By definition of covering, this is 
sufficient to guarantee over-approximation of the JOFP. The converse direction, 
namely, under-approximation, is obvious to see as every path along which the 
data flow value do is sent at the end of the algorithm is a feasible path. Together, 
these two results imply that the algorithm is guaranteed to compute the precise 
JOFP. 

Complexity. We show the complexity of our approach in the single-procedure 
setting. Our analysis follows along the lines of the analysis of the backwards 
algorithm for coverability in VASS [6]. The overall idea, is to use the technique 
of Rackoff [48] to derive a bound on the length of the paths that need to be 
considered. We derive a complexity bound of O(A.h?.L?"*'.r.log(L)), where A 
is the total number of transitions in the VCFG, Q is the number of VCFG nodes, 
h is the height of lattice of £ + £ functions, and L = (Q.(h + 1).2) 6+1, 


4 Forward DFAS Approach 


The Backward DFAS approach, though precise, requires the transfer function 
lattice to be of finite height. Due to this restriction, infinite-height abstract 
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domains like Octagons [44], which need widening [12], are not accommodated 
by Backward DFAS. To address this, we present the Forward DFAS approach, 
which admits any complete lattice as an abstract domain (if the lattice is of 
infinite height then a widening operator should also be provided). The trade-off 
is precision. Forward DFAS elides only some of the infeasible paths in the VCFG, 
and hence, in general, computes a conservative over-approximation of the JOFP. 
Forward DFAS is conceptually not as sophisticated as Backward DFAS, but is 
still a novel proposal from the perspective of the literature. 

The Forward DFAS approach is structured as an instantiation of Kildall’s 
data flow analysis framework [32]. This framework needs a given complete lattice, 
the elements of which will be propagated around the VCFG as part of the fix 
point computation. Let £ be the given underlying finite or infinite complete 
lattice. £ either needs to not have any infinite ascending chains (e.g., Constant 
Propagation), or £ needs to have an associated widening operator “V g”. The 
complete lattice D that we use in our instantiation of Kildall’s framework is defined 
as D = D,,,, + L, where & > 0 is a user-given non-negative integer, and D, « is 
the set of all vectors of size r (where r is the number of counters in the VCFG) 
such that all entries of the vectors are integers in the range [0, x]. The ordering 
on this lattice is as follows: (dj € D) E (dg € D) iff Ve € D,.,,. di(c) Ec do(c). If 
a widening operator Vg has been provided for £, we define a widening operator 
V for D as follows: djVdz = Ac € Drs. di(c) Ve do(c). 

We now need to define the abstract transfer functions with signature D — D 
for the VCFG edges, to be used within the data flow analysis. As an intermediate 
step to this end, we define a ternary relation boundedMovel as follows. Any triple 
of integers (p,q, s) E€ boundedMovel iff 


(O<p<K)A 

(q>O0Ap+qeKAs=pt+q)V (a) 
(q>O0Ap+q>KAs=k)V (b) 
(q<O0Ap=KA0N<8<KAK—8<-1*qQ)V (c) 
(q<O0Ap<KAp+q2>0As=p+q)) (d) 


We now define a ternary relation boundedMove on vectors. A triple of vectors 
(c1, c2, c3) belongs to relation boundedMove iff all three vectors are of the same 
size, and for each index i, (c1[i], c2[¢], c3ļi]) € boundedMovel. 


We now define the D > D transfer function for the VCFG edge qı a, q2 as 
follows: 


funi(l € D) = AGE Drs. | | f(Ue1)) 


cı such that (c1,w,c2)€ boundedMove 


Finally, let lọ denote following function: Ac € D,.,.if cis O then do else L, 
where dy € £. We can now invoke Kildall’s algorithm using the fun transfer 
functions defined above at all VCFG edges, using lo as the fact at the “entry” to 
the “main” procedure. After Kildall’s algorithm has finished computing the fix 
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point solution, if 1? € D is the fix point solution at any node v, we return the 
value (Ucep,.,./7(c)) as the final result at v. 

The intuition behind the approach above is as follows. If v is a vector in the 
set D,,,, and if (c,m) is a channel-message pair, then the value in the (c,m)th 
slot of v encodes the number of instances of message m in channel c currently. 
An important note is that if this value is «, it actually indicates that there are «K 
or more instances of message m in channel c, whereas if the value is less than 
k it represents itself. Hence, we can refer to vectors in D,,, as bounded queue 
configurations. If d € D is a data flow fact that holds at a node of the VCFG 
after data flow analysis terminates, then for any v € D, « if d(v) = l, it indicates 
that l is a (conservative over-approximation) of the join of the data flow facts 
brought by all feasible paths that reach the node such that the counter values at 
the ends of these paths are as indicated by v (the notion of what counter values 
are indicated by a vector v € D,,,, was described earlier in this paragraph). 

The relation boundedMove is responsible for blocking the propagation along 


some of the infeasible paths. The intuition behind it is as follows. Let us consider 


a VCFG edge qi a q2. If cı is a bounded queue configuration at node qı, 


then, cı upon propagation via this edge will become a bounded queue configura- 
tion c2 at q2 iff (c1, w,c2) E€ boundedMove. Lines (a) and (b) in the definition of 
boundedMovel correspond to sending a message; line (b) basically throws away 
the precise count when the number of messages in the channel goes above k. 
Line (c) corresponds to receiving a message when all we know is that the number 
of messages currently in the channel is greater than or equal to «. Line (d) is key 
for precision when the channel has less than « messages, as it allows a receive 
operation to proceed only if the requisite number of messages are present in the 
channel. 

The formulation above extends naturally to inter-procedural VCFGs using 
generic inter-procedural frameworks such as the call strings approach [55]. We 
omit the details of this in the interest of space. 


Properties of the approach: Since Forward DFAS is an instantiation of 
Kildall’s algorithm, it derives its properties from the same. As the set D,, is a 
finite set, it is easy to see that the fix-point algorithm will terminate. 


To argue the soundness of the algorithm, we consider the concrete lattice 


D. = D, > L, and the following “concrete” transfer function for the VCFG edge 


fiw 
qi — Q2: fun_conc(I € De) = Ac € D,. (Usen, such that cı +w=c2 fcr), 


where D, is the set of all vectors of size r of natural numbers. We then argue that 
the abstract transfer function fun defined earlier is a consistent abstraction [12] 
of fun_ conc. This soundness argument is given in detail in the appendix that 
accompanies this paper [4]. 

If we restrict our discussion to single-procedure systems, the complexity of 
our approach is just the complexity of applying Kildall’s algorithm. This works 
out to O(Q?K"h), where Q is the number of VCFG nodes, and h is either the 
height of the lattice £ or the maximum increasing sequence of values from £ 
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that is obtainable at any point using the lattice £ in conjunction with Kildall ’s 
algorithm, using the given widening operation Vg. 


cl|t |x}y|z 
el[t |xly]z mt |x[y|]z] elt an SET Tone t |x |y |z 
OJOJOJO; |2 |O;LO}1L 2 loli JO NL 
a 2 [okil] Sono eie aoan TP rit 
(1) (2) (3) (4) 6) 6) 
olft [x [y |z] [Alt |x Iy Iz 
Tjojo [ojo] fyi I 
2|/0|1 JO ||1 [211 1 
3| 1 TITHI BIE 1 
(7) (8) 


Fig. 4. Data flow facts over a run of the algorithm 


Illustration: We illustrate Forward DFAS using the example in Figure 3. Fig- 
ure 4 depicts the data flow values at four selected nodes as they get updated over 
eight selected points of time during the run of the algorithm. In this illustration 
we assume a context insensitive analysis for simplicity (it so happens that context 
sensitivity does not matter in this specific example). We use the value « = 3. Each 
small table is a data flow fact, i.e., an element of D = D,.,, + £. The top-left cell 
in the table shows the node at which the fact arises. In each row the first column 
shows the counter value, while the remaining columns depict the known constant 
value of the variables (T indicates unknown). Here are some interesting things to 
note. When any tuple of constant values transfers along the path from node c 
to node m, the constant values get updated due to the assignment statements 
encountered, and this tuple shifts from counter i to counter i + 1 (if i is not 
already equal to «) due to the “send” operation encountered. When we transition 
from Step (5) to Step (6) in the figure, we get T’s, as counter values 2 and 3 in 
Step (5) both map to counter value 3 in Step (6) due to « being 3 (hence, the 
constant values get joined). The value at node o (in Step (7)) is the join of values 
from Steps (5) and (6). Finally, when the value at node o propagates to node k, 
the tuple of constants associated with counter value 3 end up getting mapped to 
all lower values as well due to the receive operations encountered. 

Note, the precision of our approach in general increases with the value of «K 
(the running time increases as well). For instance, if «x is set to 2 (rather than 
3) in the example, some more infeasible paths would be traversed. Only z = 1 
would be inferred at node k, instead of (t = 1, z = 1). 


5 Implementation and Evaluation 


We have implemented prototypes of both the Forward DFAS and Backward DFAS 
approaches, in Java. Both the implementations have been parallelized, using the 
ThreadPool library. With Backward DFAS the iterations of the outer “repeat” 
loop in Algorithm 1 run in parallel, while with Forward DFAS propagations of 
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values from different nodes to their respective successors happen in parallel. Our 
implementations currently target systems without procedure calls, as none of our 
benchmarks had recursive procedure calls. 

Our implementations accept a given system, and a “target” control state q 
in one of the processes of the system at which the JOFP is desired. They then 
construct the VCFG from the system (see Section 2.1), and identify the target 
set of q, which is the set of VCFG nodes in which q is a constituent. For instance, 
in Figure 2, the target set for control state e is {(a,e), (b,e)}. The JOFPs at 
the nodes in the target set are then computed, and the join of these JOFPs is 
returned as the result for q. 

Each variable reference in any transition leaving any control state is called a 
“use”. For instance, in Figure 2, the reference to variable x along the outgoing 
transition from state d is one use. In all our experiments, the objective is to find 
the uses that are definitely constants by computing the JOFP at all uses. This 
is a common objective in many research papers, as finding constants enables 
optimizations such as constant folding, and also checking assertions in the code. 
We instantiate Forward DFAS with the Constant Propagation (CP) analysis, and 
Backward DFAS with the LCP analysis (for the reason discussed in Section 3.1). 
We use the bound « = 2 in all runs of Forward DFAS, except with two benchmarks 
which are too large to scale to this bound. We discuss this later in this section. 
All the experiments were run on a machine with 128GB RAM and four AMD 
Opteron 6386 SE processors (64 cores total). 


5.1 Benchmarks and modeling 


Table 1. Information about the benchmarks. Abbreviations used: (a) prtcl = protocol, 
(b) comm = communication, (c) app = application 


Benchmark Description #Proc|#Var| r| #VCFG 
(1) (2) (3)| _(4)|(5)|nodes (6) 
mutex mutual exclusion example 3 1) 6 4536 
bartlett Bartlett’s alternating-bit prtcl 3 3) 7 17864 
leader leader election prtcl 2 11} 12 16002 
lynch distorted channel comm prtcl 3 5| 27 168912 
peterson Peterson’s mutual exclusion prtcl 3 4) 4 6864 
boundedAsync illustrative example 3 5| 10 14375 
receivel illustrative example 2 5| 13 1160 
server actor-based client server app 3 3| 6 1232 
chameneos Chameneos concurrency game 3 9) 10 45584 
replicatingStorage |replicating storage system 4 4) 8 47952 
event bus_ test publish-subscribe system 2 2} 5 160 
jobqueue_ test concurrent job queue system 4 1| 10 28800 
bookCollectionStore) REST app 2 2| 12 2162 
nursery _ test structured concurrency app 3 2) 4 1260 
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We use 14 benchmarks for our evaluations. These are described in the first two 
columns of Table 1. Four benchmarks — bartlett, leader, lynch, and peterson — are 
Promela models for the Spin model-checker. Three benchmarks — boundedAsync, 
receivel, and replicatingStorage — are from the P language repository (www. 
github.com/p-org). Two benchmarks — server and chameneos — are from the Basset 
repository (www.github.com /SoftwareEngineeringToolDemos/FSE-2010-Basset). Four 
benchmarks — event_bus_ test, jobqueue_test, nursery_ test, and bookCollec- 
tionStore — are real world Go programs. There is one toy example “mutex”, for 
ensuring mutual exclusion, via blocking receive messages, that we have made 
ourselves. We provide precise links to the benchmarks in the appendix [4]. 


Our DFAS implementations expect the asynchronous system to be specified 
in an XML format. We have developed a custom XML schema for this, closely 
based on the Promela modeling language used in Spin [26]. We followed this 
direction in order to be able to evaluate our approach on examples from different 
languages. We manually translated each benchmark into an XML file, which we 
call a model. As the input XML schema is close to Promela, the Spin models were 
easily translated. Other benchmarks had to be translated to our XML schema by 
understanding their semantics. 


Note that both our approaches are expensive in the worst-case (exponential 
or worse in the number of counters r). Therefore, we have chosen benchmarks 
that are moderate in their complexity metrics. Still, these benchmarks are real 
and contain complex logic (e.g., the leader election example from Promela, 
which was discussed in detail in Section 1.1). We have also performed some 
manual simplifications to the benchmarks to aid scalability (discussed below). 
Our evaluation is aimed towards understanding the impact on precision due to 
infeasible paths in real benchmarks, and not necessarily to evaluate applicability 
of our approach to large systems. 


We now list some of the simplifications referred to above. Language-specific 
idioms that were irrelevant to the core logic of the benchmark were removed. The 
number of instances of identical processes in some of the models were reduced 
in a behavior-preserving manner according to our best judgment. In many of 
the benchmarks, messages carry payload. Usually the payload is one byte. We 
would have needed 256 counters just to encode the payload of one 1-byte message. 
Therefore, in the interest of keeping the analysis time manageable, the payload 
size was reduced to 1 bit or 2 bits. The reduction was done while preserving key 
behavioral aspects according to our best judgment. Finally, procedure calls were 
inlined (there was no use of recursion in the benchmarks). 


In the rest of this section, whenever we say “benchmark”, we actually mean 
the model we created corresponding to the benchmark. Table 1 also shows various 
metrics of our benchmarks (based on the XML models). Column 3-6 depict, 
respectively, the number of processes, the total number of variables, the number 
of “counters” r, and the total number of nodes in the VCFG. We provide our 
XML models of all our benchmarks, as well as full output files from the runs of 
our approach, as a downloadable folder (https: //drive.google.com /drive/folders/ 
181DloNfm6_UHFyz7qni8rZjwCp-a80CV). 
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5.2 Data flow analysis results 


Table 2. Data flow analysis results 


DFAS Approach Baseline Approaches 
Benchmark #Var. |#Asserts| #Consts. | #Verified | Consts. | # Verified 
(1) uses (2)] (3) (4) (5) (6) (7) 
Forw.|Back.|Forw.|Back.| JOP|CCP/|JOP|CCP 
mutex 6 2 6 6 2 2 0 0 0 0 
bartlett 9 1 0 0 0 0 0 0 0 0 
leader 54 4 20 6 4 0 6 6 2 0 
lynch 6 2 4 3 0 0 4 3 0 0 
peterson 14 2 0 0 0 0 0 0 0 0 
boundedAsync 24 8 8 8 0 0 8 8 0 0 
receivel 9 5 8 8 4 4 2 8 2 4 
server 4 1 0 0 0 0 0 0 0 0 
chameneos 35 2 2 2 0 0 2 2 0 0 
replicatingStorage 8 1 2 0 1 0 0 0 0 0 
event _bus_ test 5 3 3 3 3 3} 0 2) 0 2 
jobqueue_ test 3 1 0 1 0 1 0 0 0 0 
bookCollectionStore 10 8 8 10 6 8 0 8 0 6 
nursery test 2 2 2 2 2 2 0 2 0 2 
Total 189 42 63} 49 22 20} 22) 39 4) 14 


We structure our evaluation as a set of research questions (RQs) below. Table 2 


summarizes results for the first three RQs, while Table 3 summarizes results for 
RQ 4. 


RQ 1: How many constants are identified by the Forward and Backward DFAS 
approaches? Column (2) in Table 2 shows the number of uses in each benchmark. 
Columns (4)-Forw and (4)-Back show the number of uses identified as constants 
by the Forward and Backward DFAS approaches, respectively. In total across 
all benchmarks Forward DFAS identifies 63 constants whereas Backward DFAS 
identifies 49 constants. 

Although in aggregate Backward DFAS appears weaker than Forward DFAS, 
Backward DFAS infers more constants than Forward DFAS in two benchmarks 
— jobqueue_test and bookCollectionStore. Therefore, the two approaches are 
actually incomparable. The advantage of Forward DFAS is that it can use 
relatively more precise analyses like CP that do not satisfy the assumptions 
of Backward DFAS, while the advantage of Backward DFAS is that it always 
computes the precise JOFP. 


RQ 2: How many assertions are verified by the approaches? Verifying assertions 
that occur in code is a useful activity as it gives confidence to developers. All but 
one of our benchmarks had assertions (in the original code itself, before modeling). 
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We carried over these assertions into our models. For instance, for the benchmark 
leader, the assertion appears in Line 11 in Figure 1. In some benchmarks, like 
jobqueue_ test, the assertions were part of test cases. It makes sense to verify 
these assertions as well, as unlike in testing, our technique considers all possible 
interleavings of the processes. As “bookCollectionStore” did not come with any 
assertions, a graduate student who was unfamiliar with our work studied the 
benchmark and suggested assertions. 

Column (3) in Table 2 shows the number of assertions present in each bench- 
mark. Columns (5)-Forw and (5)-Back in Table 2 show the number of assertions 
declared as safe (i.e., verified) by the Forward and Backward DFAS approaches, 
respectively. An assertion is considered verified iff constants (as opposed to “T”) 
are inferred for all the variables used in the assertion, and if these constants 
satisfy the assertion. As can be seen from the last row in Table 2, both approaches 
verify a substantial percentage of all the assertions — 52% by Forward DFAS and 
48% by Backward DFAS. We believe these results are surprisingly useful, given 
that our technique needs no loop invariants or usage of theorem provers. 


RQ 3: Are the DFAS approaches more precise than baseline approaches? We 
compare the DFAS results with two baseline approaches. The first baseline is 
a Join-Over-all-Paths (JOP) analysis, which basically performs CP analysis on 
the VCFG without eliding any infeasible paths. Columns (6)-JOP and (7)-JOP 
in Table 2 show the number of constants inferred and the number of assertions 
verified by the JOP baseline. It can be seen that Backward DFAS identifies 2.2 
times the number of constants as JOP, while Forward DFAS identifies 2.9 times 
the number of constants as JOP (see columns (4)-Forw, (4)-Back, and (6)-JOP 
in the Total row in Table 2). In terms of assertions, each of them verifies almost 
5 times as many assertions as JOP (see columns (5)-Forw, (5)-Back, and (7)-JOP 
in Total row in Table 2.) It is clear from the results that eliding infeasible paths 
is extremely important for precision. 

The second baseline is Copy Constant Propagation (CCP) [50]. This is another 
variant of constant propagation that is even less precise than LCP. However, it 
is based on a finite lattice, specifically, an IFDS [50] lattice. Hence this baseline 
represents the capability of the closest related work to ours [29], which elides 
infeasible paths but supports only IFDS lattices, which are a sub-class of finite 
lattices. (Their implementation also used a finite lattice of predicates, but we are 
not aware of a predicate-identification tool that would work on our benchmarks 
out of the box.) We implemented the CCP baseline within our Backward DFAS 
framework. This baseline hence computes the JOFP using CCP (i.e., it elides 
infeasible paths). 

Columns (6)-CCP and (7)-CCP in Table 2 show the number of constants 
inferred and the number of assertions verified by the CCP baseline. From the 
Total row in Table 2 it can be seen that Forward DFAS finds 62% more constants 
than CCP, while Backward DFAS finds 26% more constants than CCP. With 
respect to number of assertions verified, the respective gains are 57% and 43%. 
In other words, infinite domains such as CP or LCP can give significantly more 
precision than closely related finite domains such as CCP. 
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Table 3. Execution time in seconds 


mut.|bar.| lea.) lyn.|pet.|bou.|/rec.|ser.| cha.|rep.jeve.| job.|boo.|nur. 
Forw| 1.2/14.0} 1.3) 8.0} 1.2|21.0| 1.2) 1.2) 18.0} 2.4) 1.2) 1.2) 1.2) 1.2 
Back] 5.0/11.0}284.0]118.0/13.0) 21.0} 8.0] 3.0/220.0/21.0] 3.0)140.0] 16.0) 1.0 
JOP} 1.2} 1.3) 1.6) 8.0) 1.2) 1.4) 1.3) 1.2) 3.1] 3.0] 1.1) 1.4) 1.2) 1.2 
CCP} 5.0}12.0)226.0}116.0)12.0] 14.0} 8.0) 3.0}156.0/24.0} 3.0} 51.0/30.0} 1.0 


RQ 4: How does the execution cost of DFAS approaches compare to the cost of 
the JOP baseline? The columns in Table 3 correspond to the benchmarks (only 
first three letters of each benchmark’s name are shown in the interest of space). 
The rows show the running times for Forward DFAS, Backward DFAS, JOP 
baseline, and CCP baseline, respectively. 

The JOP baseline was quite fast on almost all benchmarks (except lynch). 
This is because it maintains just a single data flow fact per VCFG node, in 
contrast to our approaches. Forward DFAS was generally quite efficient, except 
on chameneos and lynch. On these two benchmarks, it scaled only with « = 1 
and « = 0, respectively, encountering memory-related crashes at higher values of 
k (we used K = 2 for all other benchmarks). These two benchmarks have large 
number of nodes and a high value of r, which increases the size of the data flow 
facts. 

The running time of Backward DFAS is substantially higher than the JOP 
baseline. One reason for this is that being a demand-driven approach, the approach 
is invoked separately for each use (Table 2, Col. 2), and the cumulative time 
across all these invocations is reported in the table. In fact, the mean time per 
query for Backward DFAS is less than the total time for Forward DFAS on 9 out 
of 14 benchmarks, in some cases by a factor of 20x. Also, unlike Forward DFAS, 
Backward DFAS visits a small portion of the VCFG in each invocation. Therefore, 
Backward DFAS is more memory efficient and scales to all our benchmarks. Every 
invocation of Backward DFAS consumed less than 32GB of memory, whereas with 
Forward DFAS, three benchmarks (leader, replicatingStorage, and jobqueue_ test) 
required more than 32GB, and two (lynch and chameneos) needed more than the 
128 GB that was available in the machine. On the whole, the time requirement 
of Backward DFAS is still acceptable considering the large precision gain over 
the JOP baseline. 


5.3 Limitations and Threats to Validity 


The results of the evaluation using our prototype implementation are very 
encouraging, in terms of both usefulness and efficiency. The evaluation does 
however pose some threats to the validity of our results. The benchmark set, 
though extracted from a wide set of sources, may not be exhaustive in its idioms. 
Also, while modeling, we had to simplify some of the features of the benchmarks 
in order to let the approaches scale. Therefore, applicability of our approach 
directly on real systems with all their language-level complexities, use of libraries, 
etc., is not yet established, and would be a very interesting line of future work. 
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6 Related Work 


The modeling and analysis of parallel systems, which include asynchronous 
systems, multi-threaded systems, distributed systems, event-driven systems, etc., 
has been the focus of a large body of work, for a very long time. We discuss some 
of the more closely related previous work, by dividing the work into four broad 
categories. 


Data Flow Analysis: The work of Jhala et al. [29] is the closest work that 
addresses similar challenges as our work. They combine the Expand, Enlarge and 
Check (EEC) algorithm [21] that answers control state reachability in WSTS [18], 
with the unordered channel abstraction, and the IFDS [50] algorithm for data 
flow analysis, to compute the JOFP solution for all nodes. They admit only IDFS 
abstract domains, which are finite by definition. Some recent work has extended 
this approach for analyzing JavaScript [60] and Android [45] programs. Both our 
approaches are dissimilar to theirs, and we admit infinite lattices (like CP and 
LCP). On the other hand, their approach is able to handle parameter passing 
between procedures, which we do not. 

Bronevetsky et al. [8] address generalized data flow analysis of a very restricted 
class of systems, where any receive operation must receive messages from a specific 
process, and channel contents are not allowed to cause non-determinism in control 
flow. Other work has addressed analysis of asynchrony in web applications [28,42]. 
These approaches are efficient, but over-approximate the JOFP by eliding only 
certain specific types of infeasible paths. 


Formal Modeling and Verification: Verification of asynchronous systems has 
received a lot of attention over a long time. VASS [31] and Petri nets [49] 
(which both support unordered channel abstraction) have been used widely to 
model parallel and asynchronous processes [31,38,54,29,19,5]. Different analysis 
problems based on these models have been studied, such as reachability of 
configurations [7,43,34,35], coverability and boundedness [31,3,2,18,21,6], and 
coverability in the presence of stacks or other data structures [57,5,9,10,40]. 

The coverability problem mentioned above is considered equivalent to con- 
trol state reachability, and has received wide attention [1,14,29,19,54,20,33,5,56]. 
Abdulla et al. [3] were the first to provide a backward algorithm to answer cover- 
ability. Our Backward DFAS approach is structurally similar to their approach, 
but is a strict generalization, as we incorporate data flow analysis using infinite 
abstract domains. (It is noteworthy that when the abstract domain is finite, then 
data flow analysis can be reduced to coverability.) One difference is that we use 
the unordered channel abstraction, while they use the lossy channel abstraction. 
It is possible to modify our approach to use lossy channels as well (when there 
are no procedure calls, which they also do not allow); we omit the formalization 
of this due to lack of space. 

Bouajjani and Emmi [5] generalize over previous coverability results by 
solving the coverability problem for a class of multi-procedure systems called 
recursively parallel programs. Their class of systems is somewhat broader than 
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ours, as they allow a caller to receive the messages sent by its callees. Our 
COMPUTEENDTOEND routine in Algorithm 2 is structurally similar to their 
approach. They admit finite abstract domains only. It would be interesting future 
work to extend the Backward DFAS approach to their class of systems. 


Our approaches explore all interleavings between the processes, following the 
Spin semantics. Whereas, the closest previous approaches [29,5] only address 
“event-based” systems, wherein a set of processes execute sequentially without 
interleaving at the statement level, but over an unbounded schedule (i-e., each 
process executes from start to finish whenever it is scheduled). 


Other forms of verification: Proof-based techniques have been explored for veri- 
fying asynchronous and distributed systems [24,58,47,22]. These techniques need 
inductive variants and are not as user-friendly as data flow analysis techniques. 
Behavioral types have been used to tackle specific analysis problems such as 
deadlock detection and correct usage of channels [36,37,52]. 


Testing and Model Checking: Languages and tools such as Spin and Promela [26], 
P [15], P# [13], and JPF-Actor [39] have been used widely to model-check 
asynchronous systems. A lot of work has been done in testing of asynchronous 
systems [16,13,53,23,59] as well. Such techniques are bounded in nature and 
cannot provide the strong verification guarantees that data flow analysis provides. 


7 Conclusions and Future Work 


In spite of the substantial body of work on analysis and verification of distributed 
systems, there is no existing approach that performs precise data flow analysis of 
such systems using infinite abstract domains, which are otherwise very commonly 
used with sequential programs. We propose two data flow analysis approaches 
that solve this problem — one computes the precise JOFP solution always, while 
the other one admits a fully general class of infinite abstract domains. We have 
implemented our approaches, analyzed 14 benchmarks using the implementation, 
and have observed substantially higher precision from our approach over two 
different baseline approaches. 


Our approach can be extended in many ways. One interesting extension would 
be to make Backward DFAS work with infinite height lattices, using widening. 
Another possible extension could be the handling of parameters in procedure calls. 
There is significant scope for improving the scalability using better engineering, 
especially for Forward DFAS. One could explore the integration of partial-order 
reduction [11] into both our approaches. Finally, we would like to build tools 
based on our approach that apply directly to programs written in commonly-used 
languages for distributed programming. 
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Abstract. Type systems as a technique to analyse or control programs 
have been extensively studied for functional programming languages. In 
particular some systems allow to extract from a typing derivation a com- 
plexity bound on the program. We explore how to extend such results 
to parallel complexity in the setting of the pi-calculus, considered as a 
communication-based model for parallel computation. Two notions of 
time complexity are given: the total computation time without paral- 
lelism (the work) and the computation time under maximal parallelism 
(the span). We define operational semantics to capture those two notions, 
and present two type systems from which one can extract a complexity 
bound on a process. The type systems are inspired both by size types 
and by input/output types, with additional temporal information about 
communications. 


Keywords: Type Systems - Pi-calculus - Process Calculi - Complexity 
Analysis - Implicit Computational Complexity - Size Types 


1 Introduction 


The problem of certifying time complexity bounds for programs is a challenging 
question, related to the problem of statically inferring time complexity, and it 
has been extensively studied in the setting of sequential programming languages. 
One particular approach to these questions is that of type systems, which offers 
the advantage of providing an analysis which is formally-grounded, compositional 
and modular. In the functional framework several rich type systems have been 
proposed, such that if a program can be assigned a type, then one can extract 
from the type derivation a complexity bound for its execution on any input 
(see e.g. [21,25,22,20,6,4]). The type system itself thus provides a complexity 
certification procedure, and if a type inference algorithm is also provided one 
obtains a complexity inference procedure. This research area is also related to 
implicit computational complexity, which aims at providing type systems or 
static criteria to characterize some complexity classes within a programming 
language (see e.g. [24,13,33,18,15]), and which have sometimes in a second step 
inspired a complexity certification or inference procedure. 

However, while the topic of complexity certification has been thoroughly in- 
vestigated for sequential programs both for space and time bounds, there only 
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have been a few contributions in the settings of parallel programs and distributed 
systems. In these contexts, several notions of cost can be of interest to abstract 
the computation time. First one can wish to know what is during a program 
execution the total cumulated computation time on all processors. This is called 
the work of the program. Second, one can wonder if an infinite number of pro- 
cessors were available, what would be the execution time of the program when 
it is maximally parallelized. This is called the span or depth of the program. 

The paper [23] has addressed the problem of analysing the time complexity 
of programs written in a parallel first-order functional language. In this language 
one can spawn computations in parallel and use the resulting values in the body 
of the program. This allows to express a large bunch of classical parallel algo- 
rithms. Their approach is based on amortized complexity and builds on a line of 
work in the setting of sequential languages to define type systems, which allow to 
derive bounds on the work and the span of the program. However, the language 
they are investigating does not allow communication between those computa- 
tions in parallel. Our goal is to provide an approach to analyse the time com- 
plexity of programs written in a rich language for communication-based parallel 
computation, allowing the representation of several synchronization features. We 
use for that 2-calculus, a process calculus which provides process creation, chan- 
nel name creation and name-passing in communication. An alternative approach 
could be to use a language described with session types, as in [9,10]. We will dis- 
cuss the expressivity for both languages in Section 4.2. 

We want to propose methods that, given a parallel program written in m- 
calculus, allow to derive upper bounds on its work and span. Let us mention 
that these notions are not only of theoretical interest. Some classical results 
provide upper bounds, expressed by means of the work (w) and span (s), on the 
evaluation time of a parallel program on a given number p of processors. For 
instance such a program can be evaluated on a shared-multiprocessor system 
(SMP) with p processors in time O(max(w/p, s)) (see e.g. [19]). 

Our goal in this paper is essentially fundamental and methodological, in the 
sense that we aim at proposing type systems which are general enough, well- 
behaved and provide good complexity properties. We do not focus yet at this 
stage on the design and efficiency of type inference algorithms. 

We want to be able to derive complexity bounds which are parametric in the 
size of inputs, for instance which depend on the length of a list. For that it will 
be useful to have a language of types that can carry information about sizes, 
and for this reason we take inspiration from size types [26,6]. So data-types will 
be annotated with an index which will provide some information on the size 
of values. Our approach then follows the standard approach to typing in the 
m-calculus, namely typing a channel by providing the types of the messages that 
can be sent or received through it. Actually a second ingredient will be necessary 
for us, input/output types. In this setting a channel is given a set of capabilities: 
it can be an input, an output, or have both input/output capabilities. 

Contributions. We consider a 7-calculus with an explicit tick construction; 
this allows to specify several cost models, instead of only counting the number of 
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reduction steps. Two semantics of this 7-calculus are proposed to define formally 
the work and the span of a process. We then design two type systems for the 
m-calculus, one for the work and one for the span, and establish for both a 
soundness theorem: if a process is well-typed in the first (resp. second) type 
system, then its type provides an expression which, for its execution on any 
input, bounds the work (resp. span). This approach by type system is generic: 
the soundness proof relies on subject reduction, and it gives a compositional and 
flexible result that could be adapted to extensions of the base language. 

Discussion. Note that even though one of the main usages of 7-calculus is 
to specify and analyse concurrent systems, the present paper does not aim at 
analysing the complexity of arbitrary a-calculus concurrent programs. Indeed, 
some typical examples of concurrent systems like semaphores will simply not be 
typable in the system for span (see Sect. 4.2), because of linearity conditions. 
As explained above, our interest here is instead focused on parallel computation 
expressed in the 7-calculus, which can include some form of cooperative concur- 
rency. We believe the analysis of complexity bounds for concurrent 7-calculus is 
another challenging question, which we want to address in future work. 

A comparison with related works will be done in Sect. 6. 


2 The Pi-calculus with Semantics for Work and Span 


In this work, we consider the a-calculus as a model of parallelism. The main 
points of z-calculus are that processes can be composed in parallel, communi- 
cation between processes happens with the use of channels, and channel names 
can be created dynamically. 


2.1 Syntax, Congruence and Standard Semantics for 7-Calculus 


We present here a classical syntax for the asynchronous z-calculus. More details 
about 7-calculus and variants of the syntax can be found in [34]. We define the 
sets of variables, expressions and processes by the following grammar. 


v:=2,y,2| a,b,c e:=v|0]s(e)|[Jlene 
P,Q :=0 | (P| Q) | !a(a).P | a(@).P | ae | (va)P | tick.P 
| match(e) {0 > P;; s(x)» Q} | match(e) {|| > P;; a: y> Q} 


Variables x,y,z denote base type variables, they represent integers or lists. 
Variables a,b,c denote channel names. The notation v stands for a sequence of 
variables v1, v2,..., Up. In the same way, € is a sequence of expressions. We work 
up to a-renaming, and we write P[v := é] to denote the substitution of the free 
variables ŭin P by é. For the sake of simplicity, we consider only integers and 
lists as base types in the following, but the results can be generalized to other 
algebraic data-types. 
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Intuitively, P | Q stands for the parallel composition of P and Q, a(%).P 
represents an input: it stands for the reception on the channel a of a tuple of 
values identified by the variables ù in the continuation P. The process !a(v).P 
is a replicated version of a(¥).P, it behaves like an infinite number of a(¥).P in 
parallel. The process @(é) represents an output: it sends a sequence of expressions 
on the channel a. A process (va)P dynamically creates a new channel name a 
and then proceeds as P. We also have classical pattern matching on data types, 
and finally, in tick.P, the tick incurs an additional cost of one. This constructor 
is the source of time complexity in a program. It can represent different cost 
models and it is more general than only counting the number of reduction steps. 
For example, by adding a tick after each input, we can count the number of 
communications in a process. By adding it after each replicated input on a 
channel a, we can count the number of calls to a. And if we want to count 
the number of reduction steps, we can add a tick after each input and pattern 


matching. 
We can now describe the classical semantics for this calculus. We first define 
on those processes a congruence relation = : this is the least congruence relation 


closed under: 
P|0=P_ PPS Qe P|(Q|R)=(P|Q)|R 


(va)(vb)P = (vb)(va)P (va)(P | Q) = (va)P | Q (when a is not free in Q) 


Note that the last rule can always be applied from right to left by a-renaming. 
Also, one can see that contrary to usual congruence relation for the z-calculus, 
we do not consider the rule for replicated input (!P = !P | P) as it will be 
captured by the semantics, and a-conversion is not taken as an explicit rule 
in the congruence. By associativity, we will often write parallel composition for 
any number of processes and not only two. Another way to see this congruence 
relation is that, up to congruence, a process is entirely described by a set of 
channel names and a multiset of processes. Formally, we can give the following 
definition. 


Definition 1 (Guarded Processes and Canonical Form). A process G is 
guarded if it has one of the following shapes: 


G :=!a(0).P | a(o).P | a(é) | tick.P | 


match(e) {0 > P;; s(x) 4H Q} | match(e) {[ > P;; x:y > Q} 


We say that a process is in canonical form if it has the form (vă) (Ga |---| Gn) 
with G1, ...,Gn guarded processes. 


The properties of this canonical form can be found in the technical report 
[5], here we only use it to give an intuition of how one could understand a 
process. Thus, it is enough to consider that for each process P, there is a process 
in canonical form congruent to P. Moreover, this canonical form is unique up 
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to the ordering of names and processes, and up to congruence inside guarded 
processes. 

We can now define the usual reduction relation for the z-calculus, that we 
denote P — Q. It is defined by the rules given in Figure 1. The rules for integers 
are not detailed as they can be deduced from the ones for lists. Remark that 
substitution should be well-defined in order to do some reduction steps: channel 
names must be substituted by other channel names and base type variables can 
be substituted by any expression except channel names. However, when we will 
consider typed processes, this will always yield well-defined substitutions. 


la(@).P | a(é) >la(0).P | Pla := ë] a(%).P | alë) > Plv:=@] 


match(|]) {[]}}> P;; x :y> Q} > P 


match(e :: e') {[] > P;; ©: y > Q} > Qlz,y := e,e'] 


P>Q P> Q P= P' P’ > Q’ Q’=Q 
P|R>Q|R (va)P —> (va)Q P> Q 


Fig. 1. Standard Reduction Rules 


For now, this relation cannot reduce a process of the form tick.P. So, we 
need to introduce a reduction rule for tick. From this semantics, we will define a 
reduction corresponding to total complexity (work). Then, we will define parallel 
complexity (span) by taking an expansion of the standard reduction. 


2.2 Semantics and Complexity 


Work. We first describe a semantics for the work, that is to say the total number 
of ticks during a reduction without parallelism. The time reduction — is defined 
in Figure 2. Intuitively, this reduction removes exactly one tick at the top-level. 


tick.P >, P 
P >, P' Q> Q P,P’ 
P|Q-1P’|Q P|Q71 P| Q (va)P >, (va)P’ 


Fig. 2. Simple Tick Reduction Rules 


Then from any process P, a sequence of reduction steps to Q is just a sequence 
of one-step reductions with —> or —>1, and the work complexity of this sequence 
is the number of — , steps. In this paper, we always consider the worst-case 
complexity so the work of a process is defined as the maximal complexity over 
all such sequences of reduction steps from this process. 
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Notice that with this semantics for work, adding tick in a process does not 
change its behaviour: we do not create nor erase reduction paths. 


Span. A more interesting notion of complexity in this calculus is the parallel 
one. Before presenting the semantics, we present with some simple examples 
what kind of properties we want for this parallel complexity. 

First, we want a parallel complexity that works as if we had an infinite 
number of processors. So, on the process tick.0 | tick.0 | tick.0 | --- | tick.0 
we want the complexity to be 1, whatever the number of tick in parallel. 

Moreover, reductions with a zero-cost complexity (in our setting, this should 
mean all reductions except when we reduce a tick) should not harm this max- 
imal parallelism. For example a().tick.0 | @() | tick.0 should also have com- 
plexity one, because intuitively this synchronization between the input and the 
output can be done independently of the tick on the right, and then the tick 
on the left can be reduced in parallel with the tick on the right. 

Finally, as before for the work, adding a tick should not change the behaviour 
of a process. For instance, consider the process tick.a().Pp | a().tick.P; | a(), 
where a is not used in Py and P,. This process should have the complexity 
masz(1 + Co, 1 + C1), where C; is the cost of P;. Indeed, there are two possible 
reductions, either we reduce the tick, and then we synchronize the left input 
with the output, and continue with Po, or we first do the synchronization with 
the right input and the output, we then reduces the ticks and finally we continue 
as P,. 

A possible way to define such a parallel complexity would be to take causal 
complexity [13,12,11], however we believe there is a simpler presentation for our 
case. In the technical report [5], we prove the equivalence between causal com- 
plexity and the notion presented here. The idea has been proposed by Naoki 
Kobayashi (private communication). It consists in introducing a new construc- 
tion for processes, m : P, where m is an integer. A process using this constructor 
will be called an annotated process. Intuitively, this annotated process has the 
meaning P with m ticks before. We can then enrich the congruence relation = 
with the following rules: 


m:(P|Q)=(m: P)|(m: Q) m : (va)P = (va)(m: P) 
m:(n:P)=(m+n):P 0:P=P 


This intuitively means that the ticks can be distributed over parallel com- 
position, name creation can be done before or after ticks without changing the 
semantics, ticks can be grouped together, and finally zero tick is equivalent to 
nothing. 

With this congruence relation and this new constructor, we can give a new 
shape to the canonical form presented in Definition 1. 


Definition 2 (Canonical Form for Annotated Processes). An annotated 
process is in canonical form if it has the shape: 


(va)(n, : Gy | +++ | Nm : Gm) 
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with G1,..., Gm guarded annotated processes. 


Remark that the congruence relation above allows to obtain this canonical 
form from any annotated processes. With this intuition in mind, we can then 
define a reduction relation +, for annotated processes. The rules are given in 
Figure 3. We do not detail the rules for integers as they are deducible from 
the ones for lists. Intuitively, this semantics works as the usual semantics for 
pi-calculus, but when doing a synchronization, we keep the maximal annotation, 
and ticks are memorized in the annotations. 


(n : a(ù).P) | (m : alë) =p (max(m,n) : P[ù := é)) tick.P >p1:P 


(n :la(ù).P) | (m : alë) =p (n :!a(ù).P) | (max(m,n) : P[ù := é]) 


match([]) {] = P;; æ: y > Q} >p P 


match(e::e’) {JH P;; <: y > Q} >p Qlz, y := e, e'] 


P >p Q P >Q P =p Q 
P|R>,Q|R (va)P =p (va)Q (n: P) >p (n: Q) 
P=P' P! >», Q’ Q’=Q 
PQ 


Fig. 3. Reduction Rules 


We can then define the parallel complexity of an annotated process. 


Definition 3 (Parallel Complexity). Let P be an annotated process. We de- 
fine its local complexity Ce(P) by: 


Co(n: P) =n+Ce(P) Ce(P | Q) = max(Ce(P),Ce(Q)) 


Co((va)P) = Co(P) Co(G) = 0 if G is a guarded process 


Equivalently, Ce(P) is the maximal integer that appears in the canonical form of 
P. Then, for an annotated process P, its global parallel complexity is given by 
maz{n | P =>% QAC(Q) =n} where =} is the reflexive and transitive closure 
of =p. 


To show that this parallel complexity is well-behaved, we give the following 
lemma. 


Lemma 1 (Reduction and Local Complexity). Let P, P' be annotated pro- 
cesses such that P =>, P'. Then, we have Cg(P’) > Cp(P). 
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This lemma is proved by induction. The main point is that guarded processes 
have a local complexity equal to zero, so doing a reduction will always increase 
this local complexity. Thus, in order to bound the complexity of an annotated 
process, we need to reduce it with =p, and then we have to take the maximum 
local complexity over all normal forms. Moreover, this semantics respects the 
conditions given in the beginning of this section. 


2.3 An Example Process 


As an example, we show a way to encode a usual functional program in m- 
calculus. In order to do this, we use replicated input to encode functions, and 
we use a return channel for the output. So, given a channel f representing a 
function F such that f(y,a) returns F(y) on the channel a, we can write the 
”*map” function in our calculus as described in Figure 4. The main idea for this 
kind of encoding is to use the dynamic creation of names vy to create the return 
channel before calling a function, and then to use this channel to obtain back 
the result of this call. Note that we chose here as cost model the number of calls 
to f, and we can see the versatility of a tick constructor instead of a complexity 
that relies only on the number of reduction steps. 

With this process, on a list of length n, the work is n. However, as all calls 
to f could be done in parallel, the span is 1 for any non-empty list as input. 


y::xı > (vb) (vc)(tick.fly,b) | map(ai,f,c) | b(z).c(x2).a(z::42)) 


Fig. 4. The Map Function 


3 Size Types for the Work 


We now define a type system to bound the work of a process. The goal is to 
obtain a soundness result: if a process P is typable then we can derive an integer 
expression K such that the work of P is bounded by K. 


3.1 Size Input/Output Types 


Our type system relies on the definition of indices to keep track of the size of 
values in a process. Those indices were for example used in [6] and are greatly 
inspired by [26]. The main idea of those types in a sequential setting is to control 
recursive calls by ensuring a decreasing in the sizes. 
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Definition 4. The set of indices for natural numbers is given by the following 
grammar. 


1,J,K := i,j,k | f., In) 


The variables i,j,k are called index variables. The set of index variables is de- 
noted V. The symbol f is an element of a given set of function symbols containing 
addition and multiplication. We also assume that we have the subtraction as a 
function symbol, with n-m = 0 when m > n. Each function symbol f of arity 
ar(f) comes with an interpretation [f] : N* > N. 


Given an index valuation p : V > N, we extend the interpretation of function 
symbols to indices, noted [J], in the natural way. In an index J, the substitution 
of the occurrences of i in I by J is denoted I{J/t}. 


Definition 5 (Constraints on Indices). Let ¢ C V be a set of index variables. 
A constraint C on ¢ is an expression with the shape I x J where I and J are 
indices with free variables in p and ù% denotes a binary relation on integers. 
Usually, we take œ E€ {<,<,=,4}. Finite set of constraints are denoted ®. 


For a set ¢ C V, we say that a valuation p : ¢ —> N satisfies a constraint 
Im J on ¢, noted p E I ò% J when [J], = [J], holds. Similarly, p F & holds 
when pF C for all C € &. Likewise, we note ¢;@F C when for all valuations p 
on ¢ such that p F & we have p F C. Remark that the order < in a context ¢;@ 
is not total in general, for example (i, 7);-F i < ij and (i,j);-F ij <i. 


Definition 6. The set of base types is given by the following grammar. 


B := Nat{J, J] | List[Z, J](B) 


Intuitively, an integer n of type Nat[J, J] must be such that I < n < J. Likewise, 
a list of type List[Z, J](G) must have a length between J and J. With those types 
comes a notion of subtyping, in order to have some flexibility on bounds. This 
is described by the rules of Figure 5. In a subtyping judgement ¢;@+ T E T’ 
the free index variables of T, T’, @ should be included in æ. 


BEII gOE I< J 
$; F Nat{I, J] E Nati’, J] 


OOF <I BEJLI o;$+ BOB’ 
¢;@F List[I, J](B) E List[Z’, J’|(B’) 


Fig. 5. Subtyping Rules for Base Size Types 


Then, after base types, we have to give a type to channel names in a process. 
As we want to generalize subtyping for channel types, we will use input/output 
types [34]. Intuitively, in such a type, in addition to the types that can be sent 
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and received for a channel, a channel is given a set of capabilities: either it is 
both an input and output channel, or it has only one of those capabilities. This 
is useful in order to use subtyping, as an input channel and an output channel do 
not behave in the same way with regards to subtyping. Indeed, an input/output 
channel is invariant for subtyping, an input channel is covariant and an output 
channel is contravariant. Unlike in usual input/output types, in this work we 
also distinguish two kinds of channels : the simple channels (that we will often 
call channels), and replicated channels (called servers). 


Definition 7. The set of types is given by the following grammar. 
T:=6 | ch(T) | in(T) | out(T) | Vi.serv* (T) | Vi.iserv* (T) | Vi.oserv* (T) 


The three different types for channels and servers correspond to the three dif- 
ferent sets of capabilities. We note serv when the server have both capabili- 
ties, iserv when it has only input and oserv when it has only output. Then, 
for servers, we have additional information: there is a quantification over index 
variables, and the index Kk stands for the complexity of the process spawned 
by this server. A typical example could be a server taking as input a list and 
a channel, and sending to this channel the sorted list, in time k-n where n 
is the size of the list : P = !a(zx,b).---b(e) where e represents at the end of 
the process the list x sorted. Such a server name a could be given the type 
Vi.serv’* (List[0, 7](B), out(List[0, 7](B))). This type means that for all integers i, 
if given a list of size at most 7 and an output channel waiting for a list of size at 
most i, the process spawned by this server will stop at time at most k- i. Those 
bounded index variables 7 are very useful especially for replicated input. As a 
replicated input is made to be used several times with different values, it is useful 
to allow this kind of polymorphism on indices. Moreover, if a replicated input 
is used to encode a recursion, with this polymorphism we can take into account 
the different recursive calls with different values and different complexities. 


(¢,2);6- TCU (iP -UCT ($); BEK =k’ 
$; 8H Viserv® (T) C Vi.serv®’ (U) 
Q; B H Vi.serv* (T) E Vi.iserv* (T) Q; B H Vi.serv* (T) E Vi.oserv” (T) 
(¢,1);6- TCU (4i); bE K'< K (¢,);6 -UCT (¢,7);6F K < K’ 
e; B H Vi.iserv* (T) E Vi.iserv (Ŭ) $; B H Yi.oserv” (T) E Vi.oserv*’ (0) 
SO-TET BETO T" 
oO TET 


Fig. 6. Subtyping Rules for Server Types 


Then, we describe subtyping for servers in Figure 6. As explained previously, 
capabilities modify the variance of types, and a channel can lose capabilities by 
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subtyping. Subtyping for channel types can be deduced from the rules for servers. 
Note that the transitivity rule is not necessary and the subtyping relation could 
be exhaustively described. However, in order to reduce the number of rules, we 
present subtyping with a transitivity rule. Finally, subtyping can be extended 
to contexts, and we write l'E A when T and A have the same domain and for 
each variable v: T € I and v: T’ € A, we have T E T”. 


v:Ter 
GPT Fv: T o;®; + 0: Nat[0, 0] ġ; 8; T + [| : List[0, 0](B) 


¢;®;C F e: Nat{[J, J] 

ġ;8; + s(e): Nat +1,J +1] 
ġ;8;rFe:B d; rF e : List[I, J](B) 
d;d; rF e:e : List| +1, J+ 1](8) 
ġ;8;AFe:U gO@F ILA dPFULECT 
¢;8;CFe:T 


Fig. 7. Typing Rules for Expressions 


We can now present the type system. Rules for expressions are given in 
Figure 7. The typing for expressions ¢;; I F e : T means that under the 
constraints ®, in the context I’, the expression e can be given the type T. We use 
the notation ¢ġ; 8; I + ë : T for a sequence of typing judgements for expressions 
in the tuple ê. 


Then, rules for processes are described in Figure 8 and Figure 9. Figure 9 
describes rules specific to work, whereas rules in Figure 8 will be reused for span. 
A typing judgement ¢;®; I P<K intuitively means that under the constraints 
®, in a context I’, a process P is typable and its work complexity is bounded by 
K. 


The rules can be seen as a combination of input/output typing rules with 
rules found in a size type system. The main differences are that because of the 
two kinds of channels, we need two rules for an output. And, for servers, quan- 
tification over index variables should be taken in account. Note that a replicated 
input has complexity zero, and it is a call to this server that generates complexity 
in the type system. This is because once defined, a replicated input stays during 
all the reduction, so we do not want them to generate complexity. Note also 
that the pattern matching rules are the only ones which add constraints in the 
hypothesis, which provide information on the size in the typing. This is particu- 
larly useful for recursion. Finally, there is an explicit rule for subtyping, and in 
this rule we can arbitrarily increase the index corresponding to the complexity. 
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ġ;8; l a: TF PaK 
ġ;; T F (va)P < K 
$;(@,1 <0; PEP AK 
$8; H e: Natl, J]  4;(@,J > 1);P,x:Nat[I-1,J-1]-EQaK 
¢;8; r F match(e) {OH P;; s(t) 4H Q} ak 
$ (6,1 <0); PEP AK 

$;O;DP he: List[Z,J\(B) 4; (8, J > 1); Pe: B,y : List{I-1, J-1](B)-QaK 

ġ; P; IT Fmatch(e) {[] > P;; cn yo Qs ak 


ġ;8; rA 0<0 


ġ;8; AF PaK gO@F ICA o,8EK < Kk’ 
o;0;T Pak’ 


Fig. 8. Common Typing Rules for Processes 


¢;8;TF PAK BrEA QK’ ġ;8P; rA -PaK 
00°F P|Qak+ Kk’ ġ;8; I tick Pak +1 


6,6; + a: Vi.iserv* (T) (4, i); 8; T, 0: TH PaK 
o; ®; r Ha(v).P <0 


6;6;0 a: in(T) ġ;8; T, 0: TH PaK 
¢;8; a(t). Pak 


¢;®; T Fa: out(T) o@:CTbE:T 
o;®; + alë) <0 


6;6; r + a: Vi.oserv* (T) oO; r H è: T{I/i} 
¢; 8; r H alë) < K{]/i} 


Fig. 9. Work Typing Rules for Processes 


3.2 Subject Reduction 


We now state the properties of this typing system. We do not detail the proofs 
as we will be more precise in the following sections with the type system for 
span. In the type system for work, we can easily obtain some properties such 
as weakening and strengthening and that index variables can be substituted by 
any index in a typing derivation. Finally, we have that substitution in processes 
preserves typing. With those properties, we obtain the usual subject reduction. 


Theorem 1 (Subject Reduction). If ¢;,8;r F P< K and P + Q then 
hP: rF QIK. 


Then, we also obtain the following theorem. 


Theorem 2 (Quantitative Subject Reduction). If P >; Q and ġ; 8; I + 
P< K then we have ¢;@;T +} Q< K' with ġ;bE K'+1<K. 
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So, as a consequence we almost immediately obtain that K is indeed a bound 
on the work of P if we have ¢;$; Pak. 

Note that this soundness result is easily adaptable to similar type systems 
for work. As stated before, we can enrich the type system with other algebraic 
data-types and the proof can easily be adapted. Moreover, we can get rid of 
the distinction between channels and servers and take a similar typing for both, 
and we still get the soundness. We decided here to present this version as an 
introduction for the type system for span, but the work in itself can be of interest. 

For example, an interesting consequence of this soundness theorem is that it 
immediately gives soundness for any subsystem. In particular, we detail in the 
technical report [5] a (slightly) weaker typing system where the shape of types 
are restricted in order to have an inference procedure close to the one in [4]. 


4 Types for Parallel Complexity 


We present here a type system for span, so we want as previously a type system 
such that typing a process gives us a bound on its span. Formally, we will prove 
the following theorem: 


Theorem 3 (Typing and Complexity). Let P be a process and m be its 
global parallel complexity. If we have p; P; I P < K, then ¢;®8F K >m. 


Remark that this theorem talks about open processes. However, our notion 
of complexity does not behave well with open processes. For example the process 
match(v) {0+ P;; s(x) + Q} is in normal form for a variable v, so this process 
has global complexity 0. Still, we will also obtain the following corollary: 


Corollary 1 (Complexity and Open Processes). 


— If ġ;8;I, ù: TH P<K, then for any sequence of expressions È such that 
$;@; H é:T, K is a bound on the global complexity of Plv:= ë] 

— Ifé;$;0 F Pak, then for any other annotated process Q such that 6; 8; I+ 
Q < K', max(K, K') is a bound on the global complexity of P | Q. 


So, when we give a typing ¢;®; "+ P< K for an open process, we should 
not see K as a bound on the actual complexity on P, but we should see it as a 
bound on the complexity of this particular process in an environment respecting 
the type of I’. So, in ¢;@;v : Nat[2, 10]  match(v) {0 > P;; s(a) » Q} aK, 
K is a bound on the complexity of this pattern matching under the assumption 
that the environment gives to v an integer value between 2 and 10. 


4.1 Size Types with Time 


The type system is an extension of the previous one. In order to take into account 
parallelism, we need a way to synchronize the time between processes in parallel, 
thus we will add some time information in types, as in [27] or [9]. 
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Definition 8. The set of types and base types are given by the grammar: 


B := Nat{I, J] | List[Z, J](B) 


T :=6 | chr(T) | inr(T) | out; (T) | Yri.serv* (T) | Yri. iserv“ (T) | Yri.oserv“ (T) 


As before, we have channel types, server types, and input/output capabilities 
in those types. For a channel type or a server type, the index I is called the 
time of this type. Giving a channel name the type ch;(T) ensures that commu- 
nication on this channel should happen within time J. For example, a channel 
name of type cho(T) should be used to communicate before any tick occurs. 
With this information, we can know when the continuation of an input will be 
available. Likewise, a server name of type V;7.iserv™ (T) should be used in a repli- 
cated input, and this replicated input should be ready to receive for any time 
greater than I. Typically, a process tick.!a(v).P enforces that the type of a is 
Vyi.iserv (T) with I greater than one, as the replicated input is not ready to 
receive at time zero. 

As before, we define a notion of subtyping on those types. The rules are 
essentially the same as the ones in Figures 5 and 6. The only difference is that 
we force the time of a type to be invariant in subtyping. 

In order to write the typing rules, we need some other definitions to work 
with time in types. The first thing we need is a way to advance time. 


Definition 9 (Advancing Time in Types). Given a set of index variables ¢, 
a set of constraints ®, a type T and an index I, we define T after I time units, 
denoted (T)°? by: 


- (Ber =B 
= (chy (T) EF = ch- (Î) if p;PE J >I. It is undefined otherwise. 
Other channel types follow exactly the same pattern. 
— (Wyuserv¥ (T) E? =Vg_niserv' (T) if 6,6 J >I. 
Otherwise, (Wyiserv (T))°7 = V(y—ni.oserv* (T) 
— (Wytiserv (T))%? = Yg- niisev” (T) if o; 6 J > I. 
It is undefined otherwise. _ 7 
— (Vyi.oserv* (T))%? = V(y—ni.oserv* (T). 


This definition can be extended to contexts, with (v: T, Lee =v: (yee re 
if mE? is defined. Otherwise, (v : T, T}_; = (a: We will often omit the 
ġ;B in the notation when it is clear from the context. 

Recall that as the order < on indexes is not total, ġ;P F J > I does not 
mean that ¢;@F J <I. 


Let us explain a bit the definition here. For base types, there is no time indication 
thus nothing happens. Then, one can wonder what happens when the time of 
T is not greater than J. For non-server channel types, we consider that their 
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time is over, thus we erase them from the context. For servers this is a bit more 
complicated. Indeed, when a server is defined, it must stay available until the 
end. Thus, an output to a server should always be possible, no matter the time. 
Still, the input capability of a server should not be available eternally, as the time 
I is supposed to mean the time for which a replicated input is effectively defined. 
So, when this time has passed, we should not be able to define a replicated input 
any more. 


Definition 10 (Time Invariant Context). Given a set of index variables 
o and a set of constraints P, a context I’ is said to be time invariant when 


it only contains base type variables or output server types Vri.oserv* (T) with 
oF I=0. 


Such a context is thus invariant by the operator (-)_, for any I. This is 
typically the kind of context that we need to define a server, as a server should 
not be dependent on the time it is called. We can now present the type system. 
Typing rules for expressions and some processes do not change, they can be 
found in Figure 7 and Figure 8. In Figure 10, we present the remaining rules 
in this type system that differ from the ones in Figure 9. As before, a typing 
judgement ¢;4; F P < K intuitively means that under the constraints ®, in a 
context I’, a process P is typable and its span complexity is bounded by K. 


$B TEPaAK 60°F Q4K $d T) F PAK 
4B; rF P|Q<aK ġ;; I F tick. PaK +1 


g; H (D)”? C I” and I’ time invariant 
¢; 8; T, AF a: Yri. iserv” (T) bib: I, v TEPEK 
d; 8; T, A H'a).P <I 


¢;8;T Ha: inr(Ŭ) 6;0;(P)_,;,0:TKPAK 
¢;8;C Fa(v).PakK+I 


$;®;I Fa: outz(T) o;8;(P)_,-FéE:T 
o,8;P + alë) al 


6; 6; + a: Vri.oserv*™ (T) 6; 6; (T) rH E: TIA 
8; r Hal) < K{]/i}+I 


Fig. 10. Span Typing Rules for Processes 


The rule for parallel composition shows that we consider parallel complexity 
as we take the maximum between the two processes instead of the sum. In prac- 
tice, we ask for the same complexity K in both branches of parallel composition, 
but with the subtyping rule, it corresponds indeed to the maximum. For input 
server, we integrate some weakening on context (A), and we want a time invari- 
ant context to type the server, as a server should not depend on time. Weakening 
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is important since some types are not time invariant, such as channels. So, we 
need to separate time invariant types that can be used in the continuation P 
from other types. 

Some rules make the time advance in their continuation, for example the tick 
rule or input rule. This is expressed by the advance time operator on contexts, 
and because time advances, the complexity also increases. Also, remark that 
because of the advance of time, some channels name could disappear, thus there 
is a kind of ” time uniqueness” for channels, contrary to the previous section. This 
will be detailed later. Also, note that in the rule for replicated input, there is an 
explicit subtyping in the premises. This is because (I io is not time invariant, 
since the type of a is at least Voi.iserv® (T) in this case. However, if this server 
had both input and output capabilities, we can give a time invariant type for a 
(or other servers) just by removing the input capability, which can be done by 
subtyping. 

Looking back at Corollary 1, we can for example understand the rule for 
input by taking the judgement ¢;®;a : ch3() F a().tick.0 < 4. This expresses 
that with an environment providing a message on a within 3 times units, this 
process terminates in 4 time units. 

Finally, we can see that if we remove all size annotations and merge server 
types and channel types together, we get back the classical input/output types, 
and all the rules described here are admissible in the classical input/output type 
system for the a-calculus. 


4.2 Examples 


An Example to Justify the Use of Time. In order to justify the use of 
time in types for span, and to show how we could find the time of a channel, we 
present here three examples of recursive calls with different behaviour. We do not 
detail here a typing derivation, a more detailed example will be described later, 
in Section 5. Usually, type inference for a size system reduces to satisfying a set 
of constraints on indices. We believe that even with time indexes on channels, 
type inference is still reducible to satisfying such a set of constraints. So, for the 
sake of simplicity, we will describe this example with constraints. 
We define three processes P}, P) and P3 by: 


P, =!a(n,r).tick.match(n) {0+ 7();; s(m) (vr’)(vr")(Qi)} 


for the following definition of Q;: 


Qı =alm,r’) | alm, r") | r0” OFO 
Q2 = alm,r’) |r Qam, r”) | r070 
Q3 = a(m,r’) | r’().(a(m,r") |T) |r”0.0 


So intuitively, for P} the two recursive calls are done after one unit of time 
in parallel, and the return signal on r is done when both processes have done 
their return signal on r’ and r”. So, this is total parallelism for the two recursive 
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calls (the span is linear in n). For P, a first recursive call is done, and then 
the process waits for the return signal on r’, and when it receives it, the second 
recursive call begins. So, this is totally sequential (the span is exponential in n). 
Finally, for P} we have an intermediate situation between totally parallel and 
totally sequential. The process starts with a recursive call. Then, it waits for the 
return signal on r’. When this signal arrives, it immediately starts the second 
recursive call and immediately does the return signal on r. So, intuitively, the 
second recursive call starts when all the ”left” calls have been done. Note that 
those three servers have the same work, which is exponential in n. 

So, let us type the three examples with the type system for span. For the sake 
of simplicity, we omit the typing of expressions, we only consider the difficult 
branch for the match constructors, and we focus on complexity and time. We 
consider the following context that is used for the three processes: 


T = a : Voi.oservf © (Nat(0, i], chg) ()) n : Nat[0, i], r : chg) 


We have two unknown function symbols: f, that represents the complexity of the 
server, and g, the time for the return channel. We also use this second context: 


A= (I)_,,m: Nat[0, i-1], r” : chg(ay, r” : chy (3) 


This gives two more unknown functions, g’ and g” corresponding respectively 
to the time of r’ and r” when defined. The three processes start with the same 
typing. We use a double line to express that we do not use a real typing rule, so 
we can omit some premises or do simultaneously several typing rules. 
ii> 1; AF Q< f(i)-1 
i; (T) F match(n) {0 = T0;; s(m) => (vr')(vr")(Qi)} < f(i)-1 
i;-; I H tick.match(n) {0 => 7();; s(m) > (vr’)(vr’’)(Qi)} < f (i) 
"=; a : Yoi.oservf © (Nat[0, i], chg) ()) F P, <0 


The first thing to remark is that the typing does a tick typing rule. In this 
rule for tick, the complexity on the bottom should have the shape K + 1 for 
some K, so here we obtain immediately that f(i) > 1. In the same way, r should 
still be defined in (I°)_,, so by definition of time advance, it means that g(i) > 1. 

Then, for the three processes, the typing gives the following conditions on 
the indices, for i > 1. For Q1: 


f@)-12fG-1)— g() =90-1) gg" (6) = 9-1) 


OAZ g@)-1 29") f@M-1> 9-1 
The first constraint is because the total complexity f(z)-1 must be greater than 
the complexity of the two recursive calls f(i-1). Then, r’ and r” must have 
a time equal to g(i-1) in order to correspond to the type of a in the outputs 
a(m,r’) and alm, r”). Finally, as r” waits for input after r’, the time of r” must 
be greater than the time of r’. Similarly, the time of r (which is equal to g(i)-1) 
must be greater than the time of r”, and the total complexity f|i]-1 must be 
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greater than the complexity of r’().r”().7() which is equal to the time of r. So, 
we can satisfy the conditions with the following choice: 


f)=it1 gi =si+l J= üi 


So, as expected, the span, represented by the function f, is indeed linear. 
Then, for Q2, the second call is delayed of g/(i) time units because we need 
to wait for r’. Thus, we obtain the following constraints. 


f@-12fG1)  g'(i)=g(i-1)  f@)-1 > g'(i) + f(i-1) 
g”(-g' (i) = gli-1)  glü)-12> g") f(@)-1 2 g(i)-1 
This delay of g'(i) time units can be seen in the third and fourth constraints. 
Again, the third constraint is because the complexity should be greater that the 


complexity of the second call, and the type of r” should correspond to the type 
in a. Thus, we can take 


f =2-1 gli) = 2+t-1 


So, we indeed obtain the exponential complexity. 

However, with those two examples, the time of the channel r is always equal 
to the complexity of the server a, so we cannot really see the usefulness of time. 
Still, with the next example we obtain something more interesting. So, for Q3, 
this time the fifth constraint on g(i) (depending on when the output to r is done) 
is different, and we obtain: 


FO-112 FEI) 9 =9G-1)  f©-1 > g'(i) + f(i-1) 
g"(i)-9' (i) =g- g(t)-1 29)  fa-1>g4)-1  fü)-1 2 9") 
The last constraint is because, again, the complexity should be greater that the 


complexity of calling r”. So, using the equalities, and by removing redundant 
inequalities, we obtain for f and g: 


fli) > 1+g(li-1)+ f(i-1)  g(i)>1+g(i-1) f@)21+2-9(-1) 
Thus, we can take: 


gli)=i+1 fg) = = 


The complexity is quadratic in n. Note that for this example, the complexity 
f depends directly on g, and g is given by a recursive equation independent of f. 
So in a sense, to find the complexity, we need to find first the delay of the second 
recursive call. Without time indications on channel, it would not be possible to 
track and obtain this recurrence relation on g and thus we could not deduce the 
complexity. 

Note that the two first examples used channels as a return signal for a parallel 
computation, whereas for the last example, channels are used as a synchroniza- 
tion point in the middle of a computation. We believe that this flexibility of 
channels justifies the use of a-calculus to reason about parallel computation. 
Moreover, this work is a step to a more expressive type system inspired by [27], 
taking in account concurrent behaviour. Indeed, as we will show, the current 
type system fails to capture some simple concurrency. 
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Limitations of the Type System. Our current type system enforces some 
kind of time uniqueness in channels. Indeed, take the process a().tick.a@(). When 
trying to type this process, we obtain: 


Error 
“< H chz() E inz() “4; (a: cho())_1 F € <0 
ssa@:chr() Fa: inz() --;a : cho() F tick.a() < 1 


--;a : chz() F a().tick.a() <I +1 


As by definition (a : cho())_, is 0, we cannot type the output on a. So, channels 
have strong constraints on the time they can be used. This is true especially 
when channels are not used linearly. Still, note that we can type a process of 
the shape a().0 | a() | tick.a@(), so it is better than plain linearity on channels. 
This restriction limits examples of concurrent behaviours. For example, take two 
processes P) and P> that should be executed but not simultaneously. In order 
to do that in a concurrent setting, we can use semaphores. In z-calculus, we 
could consider the process (va)(a().P{ | a().P3 | @()), where Pi is P, with an 
output @() at the end, likewise for P}. This is a way to simulate semaphore 
in z-calculus. Now, we can see that this example has the same problem as the 
example given above if for example Pı contains a tick, thus we cannot type this 
kind of processes. 

Still, we believe that for parallel computation, our type system should be 
quite expressive in practice. Indeed, as stated above, the restriction appears 
especially when channels are not used linearly. However, it is known that linear 7- 
calculus in itself is expressive for parallel computation [31]. For example, classical 
encodings of functional programs in a parallel setting rely on the use of linear 
return signals, as we will see in the example for bitonic sort in Sect. 5. Moreover, 
session types can also be encoded in linear z-calculus in the presence of variant 
types [28,8]. Note that in order to encode a calculus as the one in [9], we would 
also need recursive types. Our calculus and its proof of soundness could be 
extended to variant types, but not straightforwardly to recursive types. However, 
we believe the results on the linear 7-calculus we cited suggest that the restriction 
given above should not be too harmful for parallel computation. 


4.3 Complexity Results 


In this section, we show how to prove that our type system indeed gives a 
bound on the number of time reduction steps of a process following the maximal 
progress assumption. We only give in this section intuitions about the proofs. 
The detailed proofs can be found in the technical report [5]. 

In the following section, as we will work with the reduction =p, we need to 
consider annotated processes instead of simple processes. So, we need to enrich 
our type system with a rule for the constructor n: P. 


POT), Pak 
¢@;Frrn:PaK+n 


78 P. Baillot and A. Ghyselen 


As the intuition suggested, this rule is equivalent to n times the typing rule 
for tick. We can now work on the properties of our type system on annotated 
processes. 

The procedure to prove the subject reduction for +, in this type system is 
intrinsically more difficult than the one for Theorem 1. So, from the proof of 
subject reduction for span, one could deduce the proof of subject reduction for 
work, just by forgetting the consideration with time and the constructor n : P 
in the following proof. Thus, in the technical report, only the proof for span is 
detailed. 

Again, we have both weakening and strengthening in this type system. We 
also have a property specific to size type systems, expressing that an index 
variable can be substituted by any index. We also need a lemma specific to the 
notion of time. 


Definition 11 (Delaying). Given a type T and an index I, we define the de- 
laying of T by I units of time, denoted T4z: 


Byr=B (chy (T)) 4.7 = chy47(T) 


and for other channel and server types, the definition is in correspondence with 
the one on the right above. This definition can be extended to contexts. 


Lemma 2 (Delaying). If ġ;8;r F Pd kK then ġ;8; Tyr PKK +I. 


With this lemma, we can see that if we add a delay of I time units in the 
contexts for all channels, it increases the complexity by J time units, thus we 
see the link between time in types and the complexity. Then, we can show the 
usual substitution lemma. 


Lemma 3 (Substitution). 


1. If 6;@;0,0:ThKe:U and¢;®;0b e:T then ġ; 8; r F elu:=e]:U. 
2. If ¢;@;T,u:TE PAK and ¢;®;0be:T then ġ; 8; r F Plu:=elak. 


Finally, we can show that typing behaves well with congruence. 


Lemma 4 (Congruence and Typing). Let P and Q be annotated processes 
such that P = Q. Then, ¢;$;°- P< K if and only if 6; @; rF QK. 


And with all this, we obtain the subject reduction. 


Theorem 4 (Subject Reduction). If ¢;8;r + P< K and P =, Q then 
hP: rF QIK. 


The proof is done by induction on P =p Q. The proof can be rather tedious 
because of subtyping and input/output types that generate a lot of cases for 
subtyping, and, as expected, the most difficult cases are for communications. 

Now that we have the subject reduction for =p, we can easily deduce a more 
generic form of Theorem 3. 
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Theorem 5. Let P be an annotated process and let m be its global parallel 
complexity. Then, for a typing ¢;@; 7 Pa K, we have ¢;@8F K >m. 


Corollary 1 is then obtained with the substitution lemma and the rule for 
parallel composition. 


5 An Example: Bitonic Sort 


As an example for this type system, we show how to obtain the bound on a 
common parallel algorithm: bitonic sort [1]. The particularity of this sorting 
algorithm is that it admits a parallel complexity in O(log(n)?). We will show 
here that our type system allows to derive this bound for the algorithm, just as 
the paper-and-pen analysis. Actually we consider here a version for lists, which 
is not optimal for the number of operations, but we obtain the usual number of 
comparisons. For the sake of simplicity, we present here the algorithm for lists of 
size a power of 2. Let us briefly sketch the ideas of this algorithm. For a formal 
description see [1]. 


— A bitonic sequence is either a sequence composed of an increasing sequence 
followed by a decreasing sequence (e.g. [2, 7, 23, 19, 8, 5]), or a cyclic rotation 
of such a sequence (e.g. [8, 5, 2, 7, 23, 19]). 

— The algorithm uses 2 main functions, bmerge and bsort. 

— bmerge takes a bitonic sequence and recursively sorts it, as follows: 
Assume s = [ao,.--,@,—] is a bitonic sequence such that [ao,.-.,@,/2—1] is 
increasing and [an/2...,@,—1] is decreasing, then we consider: 
sı = [min(ao, dn), Min(@1, @y/241)---,MIN(A, /2-1,@,—1)] 
s2 = [max(ao, an j2), MAX(41, Ap/241) +++, MAX(Ap/2~1,Ap-1)] 

Then we have: sı and s2 are bitonic and satisfy: Vx € 51, Vy € so," < y. 
bmerge then applies recursively to sı and s2 to produce a sorted sequence. 

— bsort takes a sequence and recursively sorts it. It starts by separating the 
sequence in two. Then, it recursively sorts the first sequence in increasing 
order, and the second sequence in decreasing order. With this, we obtain a 
bitonic sequence that can be sorted with bmerge. 


We will encode this algorithm in z-calculus with a boolean type. As expressed 
before, our results can easily be extended to support boolean with a conditional 
constructor. 

First, we suppose that a server for comparison lessthan is already imple- 
mented. We start with bcompare such that given two lists of same length, it 
creates the list of maximum and the list of minimum. This is described in Fig- 
ure 11. 

We present here intuitively the typing. To begin with, we suppose that 
lessthan is given the server type ooserv? (B, B,cho(Bool)), saying that this is a 
server ready to be called, and it takes in input a channel that is used to return 
the boolean value. With this, we can give to bcompare the following server type: 


Voi.serv’ (List[0, 7](B), List[0, i] (B), out, (List[0, ‘](B), List[0, é](B))) 
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'bcompare(l; ,l2,a). match(lı) { 
[] = a(l jlo) 3; 
gil, + match(l2) { 
[] > ali ,l2) 53 
yulg =œ (vb) (vc) ( 
bcompare(l} ,1g,b) | tick.lessthan(z, y,c) 
| b(lm,lm).c(z).if z then @(a::ln,y:li) else G@(y::lm ala) 
) 
} 
} 
!bmerge(up, l,a). match(l) { 
[] > a(i) 5; 
[y] > a(l) 5; 
œ let (lh ,l2) = partition(l) in (vb)(vc)(vd)( 
bcompare (l1 „l2 , b) | b(pi ,p2) . (bmerge(up,p1 ,¢) | bmerge (up, p2, d) ) 
| c(q1).d(q2). if up then let I’ = qı @ q2 in a(l’) 
else let l’ = q2 @q in a(l’) 
) 
} 
!bsort(up,l,a). match(l) { 
[] = a(l) 5; 
[y] > al) 3 
- |> let (l ,l2) = partition(/) in (vb) (vc) (vd)( 
bsort (tt ,l1,b) | bsort(ff ,l2,c) 
| b(q1).c(q2).let q = qı @ q2 in bmerge(up,q,d) | d(p).a(p) 
} 


Fig. 11. Bitonic Sort 


The important things to notice is that this server has complexity 1, and the 
channel taken in input has a time 1. In order to verify that this type is correct, 
we would first need to apply the rule for replicated input. Let us denote by I" the 
hypothesis on those two servers names, and I” be as I’ except that for bcompare 
we only have the output capability. Then, I” is indeed time invariant, and we 
have (I’)_, E I’, so we can continue the typing with this context I”. Then, we 
need to show that the process after the replicated input indeed has complexity 
1. In the cases of empty list, this can be done easily. In the non-empty case, for 
the v constructor, we must give a type to the channels b and c. We use: 


b : chy (List[0, <-1](B), List[0, -1](B)) c : ch, (Bool) 
And we can then type the different processes in parallel. 


— For the call to bcompare, the arguments have the expected type, and this 
call has complexity 1 because of the type of bcompare. 
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— For the process tick.lessthan(v, y, c), the tick enforces a decreasing of time 
1 in the context. This modifies in particular the time of c, that becomes 0. 
Thus, we can do the call to lessthan as everything is well-typed. 

— Finally, for the last process, because b has a time equal to 1, the first input has 
complexity 1 and it enforces again a decreasing of 1 time unit. In particular, 
the times of c and a become 0. Then, as there is no more tick and all 
channels have time 0, the typing proceeds without difficulties. 


So, we can indeed give this server type to bcompare, and thus we can call this 
server and it generates a complexity of 1. 

Then, to present the process for bitonic sort, let us use the macro let v = 
f(é) in P to represent (va)(f(é,a) | a(v).P), and let us also use a generalized 
pattern matching. We also assume that we have a function for concatenation 
of lists and a function partition taking a list of size 2n, and giving two lists 
corresponding to the first n elements and the last n elements. Then, the process 
for bitonic sort is given in Figure 11. 

Without going into details, the main point in the typing of those relations 
is to find a solution to a recurrence relation for the complexity of server types. 
In the typing of bmerge, we suppose given a list of size smaller than 2f and we 
choose both the complexity of this type and the time of the channel a equal to 
a certain index K (with i free in K). So, it means we chose for bmerge the type: 


Voi.serv (Bool, List[0, 2°](B), out (List[0, 2*](B))) 
Then, the typing gives us the following condition. 
i > 1 implies K > 1 + K{i-1/i} 


Indeed, the two recursive calls to bmerge are done after one unit of time (because 
the input b(p1, p2) takes one unit of time, as expressed by the type of bcompare), 
and with a list of size 2’~!. And then, the continuation after those recursive calls 
(the process after c(q1).d(q2)) does not generate any complexity. So, we can take 
K =i, and thus bmerge has logarithmic complexity. Then, in the same way we 
obtain a recurrence relation for the complexity K’ of bsort on an input list of 
size smaller than 2°. 


i > 1 implies K’ > K'{i-1/i} +i 


Again, the two recursive calls are done on lists of size 2‘-!. This time, the delay 
of 7 in the recurrence relation is given by the continuation, because of the call to 
bmerge that generates a complexity of i. Thus, we can take a K’ in O(i?), and 
we obtain in the end that bitonic sort is indeed in O(log(n)?) on a list of size n. 

Remark that in this example, the type system gives recurrence relations cor- 
responding to the usual recurrence relations we would obtain with a complexity 
analysis by hand. Here, the recurrence relation is only on K because channel 
names are only used as return channels, so their time is always equal to the 
complexity of the server that uses them. In general this is not the case as we 
saw before, so we obtain in general mutually recurrent relations when defining 
a server. 
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6 Related Work 


An analysis of the complexity of parallel functional programs based on types has 
been carried out in [23]. Their system can analyse the work and the span (called 
depth in this paper), and makes use of amortized complexity analysis, which 
allows to obtain sharp bounds. However, the kind of parallelism they analyse is 
limited to parallel composition. So on the one hand we are considering a more 
general model of parallelism, and on the other hand we are not taking advantage 
of amortized analysis as they do. The paper [17] proposes a complexity analysis of 
parallel functional programs written in interaction nets, a graph-based language 
derived from linear logic. Their analysis is based on size types. However, their 
model is also quite different from ours as interaction nets do not provide name- 
passing. 

Other works like [2] tackle the problem of analysing the parallel complexity 
of a distributed system by building a distributed flow graph and searching for 
a path of maximal cost in this graph. Another approach to analyse loops with 
concurrency in an actor-based language is done by rely-guarantee reasoning [3]. 
Those approaches give interesting results on some classes of systems, but they 
cannot be directly applied to the z-calculus language we are considering, with 
dynamic creation of processes and channels. Moreover, they do not offer the 
same compositionality as analysis based on type systems. The paper [16] stud- 
ies distributed systems that are comparable to those of [2], and analyses their 
complexity by means of a behaviour type system. In a second step the types 
are used to run an analysis that returns complexity bounds. So this approach is 
more compositional than that of [2], but still does not apply to our z-calculus 
language. 

Let us now turn to related works in the setting of a-calculus or process calculi. 
To our knowledge, the first work to study parallel complexity in z-calculus by 
types was given by Kobayashi [27], as another application of his type system 
for deadlock freedom, further developed in other papers [30]. In his setting, 
channels are typed with usages, which are simple CCS-like processes to describe 
the behaviour of a channel. In order to carry out complexity analysis, those 
usages are annotated by two time informations, obligation and capability. The 
obligation level is the time at which a channel is ready to perform an action, and 
the capability level is the time at which it successfully finds a communication 
partner. We believe that when they are not infinite, the sum of those levels 
is related to our own time annotation of channels. The definition of parallel 
complexity in this work differs from ours, as it loses some non-deterministic 
paths and the extension with dependent types is suggested but not detailed. It 
is not clear to us if everything can be adapted to reason only about our parallel 
complexity, but we plan to study it in future work. More recently Das et al. in 
[9,10] proposed a type system with temporal session types to capture several 
parallel cost models with the use of a tick constructor. Our usage of time was 
inspired by their types with the usual next modality of temporal logic, but in this 
paper they also use the always and eventually modalities to gain expressivity. 
We believe that because our usage of time is more permissive, those modalities 
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would not be useful in our calculus. Because of session-types, they have linearity 
for the use of data-types such as lists, but they obtain deadlock-freedom contrary 
to our calculus. Moreover, they provide decidable operations to simplify the use 
of their types, such as subtyping, but they do not define dependent types nor 
size types that are useful to treat data-types. Still, they provide a significant 
number of examples to show the expressivity of their type system. 

The methodology of our work is inspired by implicit computational com- 
plexity, which aims at characterizing complexity classes by means of dedicated 
programming languages, mainly in the sequential setting, for instance by pro- 
viding languages for FPTIME functions. Some results have been adapted to 
the concurrent case, but mainly for the work complexity or for other languages 
than the z-calculus, e.g. [32,14,7] (the latter reference is for a higher-order r- 
calculus). The paper [13] is closer to our setting as it defines a notion of causal 
complexity in z-calculus and gives a type system characterizing processes with 
polynomial complexity. However, contrarily to those works we do not restrict to 
a particular complexity class (like FPTIME) and we handle the case of the span. 

Technically, the types we use are inspired from linear dependent types [6]. 
Those are one of the many variants of size types, which were introduced in [26]. 


7 Perspectives 


We see several possible future directions to this work: 


— Type inference: we plan to investigate how type inference could be automa- 
tized or partially automatized for the span type system. We will study typing 
by constraint generation and explore whether existing off-the-shelf solvers or 
new procedures could allow to solve these constraints. Preliminary results 
(see [5]) show that the case of work is manageable, and it generates a set 
of constraints close to the one in [4]. However, the case of span could re- 
quire more sophisticated reasoning because of the strong distinction between 
servers and channels with the advancing of time. 

— We have mentioned that our type system for span is not adapted to analyse 
some concurrent systems such as the simple example of the semaphore (Sect. 
4.2). However, we believe that a type system based on an adaptation of 
usages [27,30,29] could be promising for this purpose. 

— It would be challenging to examine whether similar type systems could be 
developed to account for some other complexity properties, for instance to 
extract the number of parallel processes needed to achieve the span. 
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Abstract. Concurrent accesses to databases are typically encapsulated 
in transactions in order to enable isolation from other concurrent compu- 
tations and resilience to failures. Modern databases provide transactions 
with various semantics corresponding to different trade-offs between con- 
sistency and availability. Since a weaker consistency model provides bet- 
ter performance, an important issue is investigating the weakest level of 
consistency needed by a given program (to satisfy its specification). As 
a way of dealing with this issue, we investigate the problem of checking 
whether a given program has the same set of behaviors when replacing 
a consistency model with a weaker one. This property known as robust- 
ness generally implies that any specification of the program is preserved 
when weakening the consistency. We focus on the robustness problem 
for consistency models which are weaker than standard serializability, 
namely, causal consistency, prefix consistency, and snapshot isolation. 
We show that checking robustness between these models is polynomial 
time reducible to a state reachability problem under serializability. We 
use this reduction to also derive a pragmatic proof technique based on 
Lipton’s reduction theory that allows to prove programs robust. We have 
applied our techniques to several challenging applications drawn from the 
literature of distributed systems and databases. 


Keywords: Transactional databases - Weak consistency - Program verification 


1 Introduction 


Concurrent accesses to databases are typically encapsulated in transactions in or- 
der to enable isolation from other concurrent computations and resilience to fail- 
ures. Modern databases provide transactions with various semantics correspond- 
ing to different tradeoffs between consistency and availability. The strongest 
consistency level is achieved with serializable transactions whose outcome 
in concurrent executions is the same as if the transactions were executed atom- 
ically in some order. Since serializability (SER) carries a significant penalty on 
availability, modern databases often provide weaker consistency models, e.g., 
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causal consistency (CC) [38], prefix consistency (PC) [25], and snapshot iso- 
lation (SI) m3. Causal consistency requires that if a transaction tı “affects” 
another transaction tg, e.g., tı executes before tz in the same session or tz reads 
a value written by tı, then the updates in these two transactions are observed 
by any other transaction in this order. Concurrent transactions, which are not 
causally related to each other, can be observed in different orders, leading to 
behaviors that are not possible under SER. Prefix consistency requires that there 
is a total commit order between all the transactions such that each transaction 
observes all the updates in a prefix of this sequence (PC is stronger than CC). 
Two transactions can observe the same prefix, which leads to behaviors that 
are not admitted by SER. Snapshot isolation further requires that two different 
transactions observe different prefixes if they both write to a common variable. 

Since a weaker consistency model provides better performance, an important 
issue is identifying the weakest level of consistency needed by a program (to sat- 
isfy its specification). One way to tackle this issue is checking whether a program 
P designed under a consistency model S' has the same behaviors when run under 
a weaker consistency model W. This property of a program is generally known 
as robustness against substituting S with W. It implies that any specification of 
P is preserved when weakening the consistency model (from S to W). Preserving 
any specification is convenient since specifications are rarely present in practice. 

The problem of checking robustness for a given program has been investi- 
gated in several recent works, but only when the stronger model (S) is SER, 
e.g., g 119; | [40], or sequential consistency in the non-transactional 
case, e.g. 29|. However, there is a large class of specifications that can 
be implemented even in the presence of “anomalies”, i.e., behaviors which are 
not admitted under SER (see for a discussion). In this context, an impor- 
tant question is whether a certain implementation (program) is robust against 
substituting a weak consistency model, e.g., SI, with a weaker one, e.g., CC. 

In this paper, we consider the sequence of increasingly strong consistency 
models mentioned above, CC, PC, and SI, and investigate the problem of checking 
robustness for a given program against weakening the consistency model to one 
in this range. We study the asymptotic complexity of this problem and propose 
effective techniques for establishing robustness based on abstraction. There are 
two important cases to consider: robustness against substituting SI with PC and 
PC with CC, respectively. Robustness against substituting SI with CC can be 
obtained as the conjunction of these two cases. 

In the first case (SI vs PC), checking robustness for a program P is reduced to 
a reachability (assertion checking) problem in a composition of P under PC with 
a monitor that checks whether a PC behavior is an “anomaly”, i.e., admitted by 
P under PC, but not under SI. This approach raises two non-trivial challenges: 
(1) defining a monitor for detecting PC vs SI anomalies that uses a minimal 
amount of auxiliary memory (to remember past events), and (2) determining 
the complexity of checking if the composition of P with the monitor reaches 
a specific control locatior|"| under the (weaker) model PC. Interestingly enough, 


1 We assume that the monitor goes to an error location when detecting an anomaly. 
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we address these two challenges by studying the relationship between these two 
weak consistency models, PC and SI, and serializability. The construction of the 
monitor is based on the fact that the PC vs SI anomalies can be defined as 
roughly, the difference between the PC vs SER and SI vs SER anomalies (investi- 
gated in previous work (13)), and we show that the reachability problem under 
PC can be reduced to a reachability problem under SER. These results lead to a 
polynomial-time reduction of this robustness problem (for arbitrary programs) 
to a reachability problem under SER, which is important from a practical point 
of view since the SER semantics (as opposed to the PC or SI semantics) can 
be encoded easily in existing verification tools (using locks to guard the isola- 
tion of transactions). These results also enable a precise characterization of the 
complexity class of this problem. 

Checking robustness against substituting PC with CC is reduced to the prob- 
lem of checking robustness against substituting SER with CC. The latter has been 
shown to be polynomial-time reducible to reachability under SER in [10]. This 
surprising result relies on the reduction from PC reachability to SER reachability 
mentioned above. This reduction shows that a given program P reaches a cer- 
tain control location under PC iff a transformed program P’, where essentially, 
each transaction is split in two parts, one part containing all the reads, and one 
part containing all the writes, reaches the same control location under SER. Since 
this reduction preserves the structure of the program, CC vs PC anomalies of a 
program P correspond to CC vs SER anomalies of the transformed program P’. 

Beyond enabling these reductions, the characterization of classes of anomalies 
or the reduction from the PC semantics to the SER semantics are also important 
for a better understanding of these weak consistency models and the differences 
between them. We believe that these results can find applications beyond ro- 
bustness checking, e.g., verifying conformance to given specifications. 

As a more pragmatic approach for establishing robustness, which avoids a 
non-reachability proof under SER, we have introduced a proof methodology that 
builds on Lipton’s reduction theory and the concept of commutativity de- 
pendency graph introduced in (9), which represents mover type dependencies 
between the transactions in a program. We give sufficient conditions for robust- 
ness in all the cases mentioned above, which characterize the commutativity 
dependency graph associated to a given program. 

We tested the applicability of these verification techniques on a benchmark 
containing seven challenging applications extracted from previous work 
[19]. These techniques are precise enough for proving or disproving the robustness 
of all these applications, for all combinations of the consistency models. 

Complete proofs and more details can be found in (1. 


2 Overview 


We give an overview of the robustness problems investigated in this paper, dis- 
cussing first the case PC vs. CC, and then SI vs PC. We end with an example that 
illustrates the robustness checking technique based on commutativity arguments. 
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procesa 2 Procesas 2 CreateEvent(v,el,3) CreateEvent(v,e2,3) 
CreateEvent(v, e1, 3): CreateEvent(v, e2, 3): ' H 
[ Tickets[v] [e1] := 3] [ Tickets[v][e2] := 3 ] HB | : PO fs HB | ; PO 
v Y 
CountTickets(v): CountTickets(v): CountTickets(v) //r=3 CountTickets(v) //r=3 
[ r := J Tickets[v] [e] ] [ r := X Tickets[v] [e] ] 
9 s (b) A CC trace of FusionTicket. 
(a) FusionTicket. 


Process 1 Process 2 
HB 
Register(u, p1): Register(u, p2): a 
[ r := RegisteredUsers[u] [ r := RegisteredUsers[u] Register(u,p1) Register(u,p2) 
assume r == assume r == M_HB_E 
RegisteredUsers[u] := 1 RegisteredUsers[u] := 1 
Password[u] := pi ] Password[u] := p2 ] (d) A CC and PC trace of Twitter. 


(c) Twitter. 


P 1 P 2 
mpepas ER RegisterRd(u,p1) RegisterRd(u,p2) 


RegisterRd(u, p1): RegisterRd(u, p2): ' B f 
[ r := RegisteredUsers[u] [ r := RegisteredUsers [u] HB | : PO S HB | : PO 
assume r = 0 ] assume r == 0 ] v v 


RegisterWr(u,p1) ap. RegisterWr(u,p2) 
RegisterWr(u, p1): RegisterWr(u, p2): 
[ RegisteredUsers[u] := 1 [ RegisteredUsers[u] := 1 (f) A CC and SER trace of trans- 
Password[u] := p1 ] Password[u] := p2 ] formed Twitter 


(e) Transformed Twitter. 


Process 1 Process 2 Process 3 
PlaceBet (1,2): PlaceBet (2,3): SettleBet(): 
[ assume time < TIMEOUT [ assume time < TIMEOUT [Bets’ := Bets 
Bets[1] := 2 ] Bets[2] := 3 ] n := Bets’.Length 


assume time > TIMEOUT & n > 0 
select i s.t. Bets’[i] # L 
return := Bets’[i] ] 


(g) Betting. 


; aceBet(2.3) «2 bari n —~ 
PlaceBet(1,2) | PlaceBet(2,3) <—— SettleBet() // retum=2 PlaceBet(1,2) SettleBet() PlaceBet(2,3) 


“ee Sa eet 


(i) Commutativity dependency 
(h) A PC and SI trace of Betting. graph of Betting. 


Fig. 1: Transactional programs and traces under different consistency models. 


Robustness PC vs CC. We illustrate the robustness against substituting PC with 
CC using the FusionTicket and the Twitter programs in Figure[la|and Figure [Ic] 
respectively. FusionTicket manages tickets for a number of events, each event 
being associated with a venue. Its state consists of a two-dimensional map that 
stores the number of tickets for an event in a given venue (r is a local variable, 
and the assignment in CountTickets is interpreted as a read of the shared state). 
The program has two processes and each process contains two transactions. The 
first transaction creates an event e in a venue v with a number of tickets n, 
and the second transaction computes the total number of tickets for all the 
events in a venue v. A possible candidate for a specification of this program is 
that the values computed in CountTickets are monotonically increasing since 
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each such value is computed after creating a new event. Twitter provides a 
transaction for registering a new user with a given username and password, 
which is executed by two parallel processes. Its state contains two maps that 
record whether a given username has been registered (0 and 1 stand for non- 
registered and registered, respectively) and the password for a given username. 
Each transaction first checks whether a given username is free (see the assume 
statement). The intended specification is that the user must be registered with 
the given password when the registration transaction succeeds. 


A program is robust against substituting PC with CC if its set of behaviors 
under the two models coincide. We model behaviors of a given program as traces, 
which record standard control-flow and data-flow dependencies between trans- 
actions, e.g., the order between transactions in the same session and whether 
a transaction reads the value written by another (read-from). The transitive 
closure of the union of all these dependency relations is called happens-before. 
Figure [Ib] pictures a trace of FusionTicket where the concrete values which are 
read in a transaction are written under comments. In this trace, each process 
registers a different event but in the same venue and with the same number of 
tickets, and it ignores the event created by the other process when computing 
the sum of tickets in the venue. 


Figure pictures a trace of FusionTicket under CC, which is a witness 
that FusionTicket is not robust against substituting PC with CC. This trace 
is also a violation of the intended specification since the number of tickets is 
not increasing (the sum of tickets is 3 in both processes). The happens-before 
dependencies (pictured with HB labeled edges) include the program-order PO 
(the order between transactions in the same process), and read-write depen- 
dencies, since an instance of CountTickets(v) does not observe the value writ- 
ten by the CreateEvent transaction in the other process (the latter overwrites 
some value that the former reads). This trace is allowed under CC because the 
transaction CreateEvent(v, el, 3) executes concurrently with the transaction 
CountTickets(v) in the other process, and similarly for CreateEvent(v, e2, 3). 
However, it is not allowed under PC since it is impossible to define a total com- 
mit order between CreateEvent(v, el, 3) and CreateEvent(v, e2, 3) that justi- 
fies the reads of both CountTickets(v) transactions (these reads should cor- 
respond to the updates in a prefix of this order). For instance, assuming that 
CreateEvent(v, el, 3) commits before CreateEvent(v, e2, 3), CountTickets(v) in 
the second process must observe the effect of CreateEvent(v, el, 3) as well since 
it observes the effect of CreateEvent(v, e2, 3). However, this contradicts the fact 
that CountTickets(v) computes the sum of tickets as being 3. 


On the other hand, Twitter is robust against substituting PC with CC. For 
instance, Figure pictures a trace of Twitter under CC, where the assume 
in both transactions pass. In this trace, the transactions Register(u,pl) and 
Register(u,p2) execute concurrently and are unaware of each other’s writes (they 
are not causally related). The HB dependencies include write-write dependencies 
since both transactions write on the same location (we consider the transaction 
in Process 2 to be the last one writing to the Password map), and read-write de- 
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pendencies since each transaction reads RegisteredUsers that is written by the 
other. This trace is also allowed under PC since the commit order can be defined 
such that Register(u,p1) is ordered before Register(u,p2), and then both trans- 
actions read from the initial state (the empty prefix). Note that this trace has a 
cyclic happens-before which means that it is not allowed under serializability. 


Checking robustness PC vs CC. We reduce the problem of checking robustness 
against substituting PC with CC to the robustness problem against substituting 
SER with CC (the latter reduces to a reachability problem under SER (10]). This 
reduction relies on a syntactic program transformation that rewrites PC behav- 
iors of a given program P to SER behaviors of another program P’. The program 
P’ is obtained by splitting each transaction t of P into two transactions: the first 
transaction performs all the reads in t and the second performs all the writes 
in t (the two are related by program order). Figure [Le] shows this transforma- 
tion applied on Twitter. The trace in Figure |lf|is a serializable execution of 
the transformed Twitter which is “observationally” equivalent to the trace in 
Figure [Id] of the original Twitter, i.e., each read of the shared state returns the 
same value and the writes on the shared state are applied in the same order 
(the acyclicity of the happens-before shows that this is a serializable trace). The 
transformed FusionTicket coincides with the original version because it contains 
no transaction that both reads and writes on the shared state. 

We show that PC behaviors and SER behaviors of the original and transformed 
program, respectively, are related by a bijection. In particular, we show that any 
PC vs. CC robustness violation of the original program manifests as a SER vs. CC 
robustness violation of the transformed program, and vice-versa. For instance, 
the CC trace of the original Twitter in Figure [Id] corresponds to the CC trace of 
the transformed Twitter in Figure and the acyclicity of the latter (the fact 
that it is admitted by SER) implies that the former is admitted by the original 
Twitter under PC. On the other hand, the trace in Figure [Ibis also a CC of the 
transformed FusionTicket and its cyclicity implies that it is not admitted by 
FusionTicket under PC, and thus, it represents a robustness violation. 


Robustness SI vs PC. We illustrate the robustness against substituting SI 
with PC using Twitter and the Betting program in Figure Twitter is not 
robust against substituting SI with PC, the trace in Figure [1d] being a witness 
violation. This trace is also a violation of the intended specification since one of 
the users registers a password that is overwritten in a concurrent transaction. 
This PC trace is not possible under SI because Register(u,p1) and Register(u,p2) 
observe the same prefix of the commit order (i.e., an empty prefix), but they 
write to a common memory location Password|u] which is not allowed under ST. 

On the other hand, the Betting program in Figure [ig] which manages a set of 
bets, is robust against substituting SI with PC. The first two processes execute 
one transaction that places a bet of a value v with a unique bet identifier id, 
assuming that the bet expiration time is not yet reached (bets are recorded in 
the map Bets). The third process contains a single transaction that settles the 
betting assuming that the bet expiration time was reached and at least one bet 
has been placed. This transaction starts by taking a snapshot of the Bets map 
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into a local variable Bets’, and then selects a random non-null value (different 
from L) in the map to correspond to the winning bet. The intended specification 
of this program is that the winning bet corresponds to a genuine bet that was 
placed. Figure [Ig}pictures a PC trace of Betting where SettleBet observes only the 
bet of the first process PlaceBet(1,2). The HB dependency towards the second 
process denotes a read-write dependency (SettleBet reads a cell of the map Bets 
which is overwritten by the second process). This trace is allowed under SI 
because no two transactions write to the same location. 


Checking robustness SI vs PC. We reduce robustness against substituting 
PC with CC to a reachability problem under SER. This reduction is based on a 
characterization of happens-before cycled? that are possible under PC but not SI, 
and the transformation described above that allows to simulate the PC seman- 
tics of a program on top of SER. The former is used to define an instrumentation 
(monitor) for the transformed program that reaches an error state iff the orig- 
inal program is not robust. Therefore, we show that the happens-before cycles 
in PC traces that are not admitted by SI must contain a transaction that (1) 
overwrites a value written by another transaction in the cycle and (2) reads a 
value overwritten by another transaction in the cycle. For instance, the trace of 
Twitter in Figure [Id]is not allowed under SI because Register(u,p2) overwrites 
a value written by Register(u,p1) (the password) and reads a value overwritten 
by Register(u,p1l) (checking whether the username u is registered). The trace of 
Betting in Figure [Ig] is allowed under SI because its happens-before is acyclic. 


Checking robustness using commutativity arguments. Based on the re- 
ductions above, we propose an approximated method for proving robustness 
based on the concept of mover in Lipton’s reduction theory [89]. A transaction 
is a left (resp., right) mover if it commutes to the left (resp., right) of another 
transaction (by a different process) while preserving the computation. We use 
the notion of mover to characterize the data-flow dependencies in the happens- 
before. Roughly, there exists a data-flow dependency between two transactions 
in some execution if one doesn’t commute to the left/right of the other one. 

We define a commutativity dependency graph which summarizes the happens- 
before dependencies in all executions of a transformed program (obtained by 
splitting the transactions of the original program as explained above), and de- 
rive a proof method for robustness which inspects paths in this graph. Two 
transactions tı and tz are linked by a directed edge iff tı cannot move to the 
right of t2 (or t2 cannot move to the left of tı), or if they are related by the 
program order. Moreover, two transactions tı and tz are linked by an undirected 
edge iff they are the result of splitting the same transaction. 

A program is robust against substituting PC with CC if roughly, its commuta- 
tivity dependency graph does not contain a simple cycle of directed edges with 
two distinct transactions tı and tz, such that tı does not commute left because 
of another transaction t3 in the cycle that reads a variable that tı writes to, 


? Traces with an acyclic happens-before are not robustness violations because they 
are admitted under serializability, which implies that they are admitted under the 
weaker model SI as well. 
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(prog) = program (process) * 

(process) ::= process (pid) regs (reg)* (tan)* 

(tan) i= begin (read)* (test)* (write)* commit 
(read) z= (label): (reg) := (var); goto (label); 
(test) i= (label): assume (bexpr); goto (label); 
(write) := (label): (var) := (reg-expr); goto (label); 


Fig. 2: The syntax of our programming language. a“ indicates zero or more oc- 
currences of a. (pid), (reg), (label), and (var) represent a process identifier, a 
register, a label, and a shared variable, respectively. (reg-expr) is an expres- 
sion over registers while (bexpr) is a Boolean expression over registers, or the 
non-deterministic choice *. 


and tz does not commute right because of another transaction t4 in the cycle (t3 
and t4 can coincide) that writes to a variable that tz either reads from or writes 
td] For instance, Figure [li] shows the commutativity dependency graph of the 
transformed Betting program, which coincides with the original Betting because 
PlaceBet(1,2) and PlaceBet(2,3) are write-only transactions and SettleBet() is 
a read-only transaction. Both simple cycles in Figure [1i] contain just two trans- 
actions and therefore do not meet the criterion above which requires at least 3 
transactions. Therefore, Betting is robust against substituting PC with CC. 

A program is robust against substituting SI with PC, if roughly, its commu- 
tativity dependency graph does not contain a simple cycle with two successive 
transactions tı and tə that are linked by an undirected edge, such that t4 does 
not commute left because of another transaction t3 in the cycle that writes to 
a variable that tı writes to, and tz does not commute right because of another 
transaction t4 in the cycle (t3 and t4 can coincide) that writes to a variable that 
t2 reads fron{’| Betting is also robust against substituting SI with PC for the 
same reason (simple cycles of size 2). 


3 Consistency Models 


Syntax. We present our results in the context of the simple programming lan- 
guage, defined in Figure] where a program is a parallel composition of processes 
distinguished using a set of identifiers P. A process is a sequence of transactions 
and each transaction is a sequence of labeled instructions. A transaction starts 
with a begin instruction and finishes with a commit instruction. Instructions in- 
clude assignments to a process-local register from a set R or to a shared variable 
from a set V, or an assume. The assignments use values from a data domain 


3 The transactions tı, t2, t3, and t4 correspond to ty, ti, tn, and t;,1, respectively, in 
Theorem 

4 The transactions tı, t2, t3, and ta correspond to tı, t2, tn, and t3, respectively, in 
Theorem [7] 
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). An assignment to a register (reg) := (var) is called a read of the shared- 
variable (var) and an assignment to a shared variable (var) := (reg) is called 
a write to the shared-variable (var). The assume (bexpr) blocks the process if 
the Boolean expression (bexpr) over registers is false. It can be used to model 
conditionals. The goto statement transfers the control to the program location 
(instruction) specified by a given label. Since multiple instructions can have the 
same label, goto statements can be used to mimic imperative constructs like 
loops and conditionals inside transactions. 

We assume w.l.o.g. that every transaction is written as a sequence of reads or 
assume statements followed by a sequence of writes (a single goto statement from 
the sequence of read/assume instructions transfers the control to the sequence 
of writes). In the context of the consistency models we study in this paper, every 
program can be equivalently rewritten as a set of transactions of this form. 

To simplify the technical exposition, programs contain a bounded number of 
processes and each process executes a bounded number of transactions. A trans- 
action may execute an unbounded number of instructions but these instructions 
concern a bounded number of variables, which makes it impossible to model SQL 
(select /update) queries that may access tables with a statically unknown num- 
ber of rows. Our results can be extended beyond these restrictions as explained 
in Remark [IJand Remark 2] 

Semantics. We describe the semantics of a program under four consistency 
models, i.e., causal consistency] (CC), prefix consistency (PC), snapshot isolation 
(SI), and serializability (SER). 

In the semantics of a program under CC, shared variables are replicated across 
each process, each process maintaining its own local valuation of these variables. 
During the execution of a transaction in a process, its writes are stored in a 
transaction log that can be accessed only by the process executing the transaction 
and that is broadcasted to all the other processes at the end of the transaction. 
To read a shared variable x, a process p first accesses its transaction log and 
takes the last written value on x, if any, and then its own valuation of the 
shared variable, if x was not written during the current transaction. Transaction 
logs are delivered to every process in an order consistent with the causal relation 
between transactions, i.e., the transitive closure of the union of the program order 
(the order in which transactions are executed by a process), and the read-from 
relation (a transaction tı reads-from a transaction tə iff t; reads a value that 
was written by t2). When a process receives a transaction log, it immediately 
applies it on its shared-variable valuation. 

In the semantics of a program under PC and SI, shared variables are stored 
in a central memory and each process keeps a local valuation of these variables. 
When a process starts a new transaction, it fetches a consistent snapshot of the 
shared variables from the central memory and stores it in its local valuation 
of these variables. During the execution of a transaction in a process, writes 
to shared variables are stored in the local valuation of these variables, and in a 
transaction log. To read a shared variable, a process takes its own valuation of the 


5 We consider a variation known as causal convergence 
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shared variable. A process commits a transaction by applying the updates in the 
transaction log on the central memory in an atomic way (to make them visible 
to all processes). Under SI, when a process applies the writes in a transaction 
log on the central memory, it must ensure that there were no concurrent writes 
that occurred after the last fetch from the central memory to a shared variable 
that was written during the current transaction. Otherwise, the transaction is 
aborted and its effects discarded. 

In the semantics of a program under SER, we adopt a simple operational 
model where we keep a single shared-variable valuation in a central memory 
(accessed by all processes) with the standard interpretation of read and write 
statements. Transactions execute serially, one after another. 

We use a standard model of executions of a program called trace. A trace 
represents the order between transactions in the same process, and the data-flow 
in an execution using standard happens-before relations between transactions. 
We assume that each transaction in a program is identified uniquely using a 
transaction identifier from a set T. Also, f : T — 2° is a mapping that associates 
each transaction in T with a sequence of read and write events from the set 


S = {re(t,x,v), we(t,x,v):t€T,2 € V,v € D} 
where re(t,z,v) is a read of x returning v, and we(t, xz, v) is a write of v to a. 


Definition 1. A trace is a tuple T = (p, f, TO, PO, WR, WW, RW) where pC T 
is a set of transaction identifiers, and 


— TO is a mapping giving the order between events in each transaction, i.e., it 
associates each transaction t in p with a total order TO(t) on f(t) x f(t). 

— PO is the program order relation, a strict partial order on p x p that orders 
every two transactions issued by the same process. 

— WR is the read-from relation between distinct transactions (t1,t2) € p x p 
representing the fact that t2 reads a value written by t1. 

— WW is the store order relation on p x p between distinct transactions that 
write to the same shared variable. 

— RW is the conflict order relation between distinct transactions, defined by 
RW = WRT}; WW (; denotes the sequential composition of two relations). 


For simplicity, for a trace T = (p, f, TO, PO, WR, WW, RW), we write t E€ T 
instead of t € p. We also assume that each trace contains a fictitious transac- 
tion that writes the initial values of all shared variables, and which is ordered 
before any other transaction in program order. Also, Trx(P) is the set of traces 
representing executions of program P under a consistency model X. 

For each X € {CC, PC, SI, SER}, the set of traces Trx(P) can be described using 
the set of properties in Table [1] A trace 7 is possible under causal consistency iff 
there exist two relations CO a partial order (causal order) and ARB a total order 
(arbitration order) that includes CO, such that the properties AxCausal, AxArb, 
and AxRetVal hold [16]. AxCausal guarantees that the program order and 
the read-from relation are included in the causal order, and AxArb guarantees 
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AxCausal cot C CO 

AxArb ARB} C ARB 

AxCC AxRetVal A AxCausal ^A AxArb 
AxPrefix | ARB; CO C CO 

AxPC AxPrefix A AxCC 
AxConflict|WW C CO 


AxSl AxConflict A AxPC 
AxSer AxRetVal ^A AxCausal ^A AxArb A CO = ARB 
where 


COo = POU WR and ARBo = PO U WR u WW 
AxRetVal = V t € 7. V re(t,x,v) € f(t) we have that 


— there exist a transaction to = Maxarg({t € T | (t,t) E€ COA we(t’, z,-) E€ f(t’) }) 
and an event we(to,z,v) = Maxtoitg)({we(to, x, -) € f(to)}). 


Table 1: Declarative definitions of consistency models. For an order relation <, 
a= Mazr<(A) iffae AAVOE A. b<a. 


that the causal order and the store order are included in the arbitration order. 
AxRetVal guarantees that a read returns the value written by the last write in the 
last transaction that contains a write to the same variable and that is ordered 
by CO before the read’s transaction. We use AxCC to denote the conjunction 
of these three properties. A trace 7 is possible under prefix consistency iff there 
exist a causal order CO and an arbitration order ARB such that AxCC holds 
and the property AxPrefix holds as well 27]. AxPrefix guarantees that every 
transaction observes a prefix of transactions that are ordered by ARB before 
it. We use AxPC to denote the conjunction of AxCC and AxPrefix. A trace T 
is possible under snapshot isolation iff there exist a causal order CO and an 
arbitration order ARB such that AxPC holds and the property AxConflict holds 
[27]. AxConflict guarantees that if two transactions write to the same variable 
then one of them must observe the other. We use AxSI to denote the conjunction 
of AxPC and AxConflict. A trace 7 is serializable iff there exist a causal order CO 
and an arbitration order ARB such that the property AxSer holds which implies 
that the two relations CO and ARB coincide. Note that for any given program 
P, Trser(P) C Trsı(P) C Tree(P) C Trec(P). Also, the four consistency models 
we consider disallow anomalies such as dirty and phantom reads. 

For a given trace T = (p, f, TO, PO, WR, WW, RW), the happens before order 
is the transitive closure of the union of all the relations in the trace, i.e., HB = 
(PO U WR U WW URW). A classic result states that a trace 7 is serializable 
iff HB is acyclic [47]. Note that HB is acyclic implies that WW is a total 
order between transactions that write to the same variable, and (PO U WR)t 
and (PO U WR U WW)?* are acyclic. 


3.1 Robustness 


In this work, we investigate the problem of checking whether a program P under 
a semantics Y € {PC, SI} produces the same set of traces as under a weaker 
semantics X € {CC, PC}. When this holds, we say that P is robust against X 
relative to Y. 
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Definition 2. A program P is called robust against a semantics 
X € {CC, PC, SI} relative to a semantics Y € {PC, SI, SER} such that Y is 
stronger than X iff Trx(P) = Try(P). 


If P is not robust against X relative to Y then there must exist a trace 
T € Trx(P) \ Try(P). We say that 7 is a robustness violation trace. 

We illustrate the notion of robustness on 

: à , t [z=] ly := 1] ts 
the programs in Figure which are com- 
monly used in the literature. In all programs, o| RWW |r 
transactions of the same process are aligned 
vertically and ordered from top to bottom. tz [rl:=y] //0 [r2 := x] //0 t4 
Each read instruction is commented with the (a) Store Buffering (SB). 
value it reads in some execution. RW 

The store buffering (SB) program in Fig- a Io Ñ =a //0 
ure Ba] contains four transactions that are is- tı e:=rl4]] z:=r2+1] 
sued by two distinct processes. We empha- ww, 
size an execution where tz reads 0 from y (b) Lost Update (LU). 
and t4 reads 0 from x. This execution is al- RW 
lowed under CC since the two writes by tı eae 10 iè =g fio 
and t3 are not causally dependent. Thus, tg % y=1 z:=1] t2 
and t4 are executed without seeing the writes BW 
from t3 and t1, respectively. However, this ex- (c) Write Skew (WS). 
ecution is not feasible under PC (which im- +4 fw:=1] fr1:=y]//1 ts 
plies that it is not feasible under both SI and 
SER). In particular, we cannot have neither o| DE |r 
(ti, t3) € ARB nor (ts, t1) € ARB which 
contradicts the fact that ARB is total or- 
der. For example, if (ti, t3) E ARB, then (d) Message Passing (MP). 
(t1,ta4) € CO (since ARB;CO Cc CO) which 
contradicts the fact that t4 does not see tı. 
Similarly, (t3,t1) € ARB implies that (t3,t2) € CO which contradicts the fact 
that t2 does not see t3. Thus, SB is not robust against CC relative to PC. 

The lost update (LU) program in Figure |3b| has two transactions that are 
issued by two distinct processes. We highlight an execution where both transac- 
tions read 0 from a. This execution is allowed under PC since both transactions 
are not causally dependent and can be executed in parallel by the two processes. 
However, it is not allowed under SI since both transactions write to a common 
variable (i.e., x). Thus, they cannot be executed in parallel and one of them 
must see the write of the other. Thus, SB is not robust against PC relative to SI. 

The write skew (WS) program in Figure |3c| has two transactions that are 
issued by two distinct processes. We highlight an execution where tı reads 0 
from x and t2 reads 0 from y. This execution is allowed under SI since both 
transactions are not causally dependent, do not write to a common variable, 
and can be executed in parallel by the two processes. However, this execution 
is not allowed under SER since one of the two transactions must see the write of 
the other. Thus, WS is not robust against SI relative to SER. 


2 


to [y :=1] [r2:=a] //1 ta 


Fig. 3: Litmus programs 
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The message passing (MP) program in F igure[3d]has four transactions issued 
by two processes. Because tı and tz are causally dependent, under any semantics 
X € {CC, PC, SI, SER} we only have three possible executions of MP, which 
correspond to either t3 and t4 not observing the writes of tı and t2, or t3 and t4 
observe the writes of both tı and tg, or t4 observes the write of tı (we highlight 
the values read in the second case in Figure Ba}. Therefore, the executions of 
this program under the four consistency models coincide. Thus, MP is robust 
against CC relative to any other model. 


4 Robustness Against CC Relative to PC 


We show that checking robustness against CC relative to PC can be reduced to 
checking robustness against CC relative to SER. The crux of this reduction is a 
program transformation that allows to simulate the PC semantics of a program 
P using the SER semantics of a program Py. Checking robustness against CC 
relative to SER can be reduced in polynomial time to reachability under SER [z0]. 

Given a program P with a set of transactions Tr(P), we define a program 
Pa such that every transaction t € Tr(P) is split into a transaction ¢t[r] that 
contains all the read/assume statements in t (in the same order) and another 
transaction t[w] that contains all the write statements in t (in the same order). 
In the following, we establish the following result: 


Theorem 1. A program P is robust against CC relative to PC iff Py is robust 
against CC relative to SER. 


Intuitively, under PC, processes can execute concurrent transactions that fetch 
the same consistent snapshot of the shared variables from the central memory 
and subsequently commit their writes. Decoupling the read part of a transaction 
from the write part allows to simulate such behaviors even under SER. 

The proof of this theorem relies on several intermediate results concerning 
the relationship between traces of P and Py. Let r = (p, PO, WR, WW, RW) € 
Trx(P) be a trace of a program P under a semantics X. We define the trace 
Te = (Pm,POg,WRe,WWe, RW) where every transaction t € 7 is split into 
two transactions t|r] € 7s and t[w] E€ 7%, and the dependency relations are 
straightforward adaptations, i.e., 


— POy, is the smallest transitive relation that includes (t[r], t[w]) for every t, 
and (t[w],t’[r]) if (t,t) € PO, 
— (t’[w], t[r]) € WRg, (t’[w], tlw]) € WWg, and (t'[r],t[w]) E€ RWe if (t,t) € 
WR, (t,t) © WW, and (t’,t) € RW, respectively. 
. For inseam, Figure a pictures the nbl [rl=a) //0  [2= z] //0 tab 
race Ty, for the LU trace 7 given in Figure Ww 
For traces T of programs that contain po| DS Po 
singleton transactions, e.g., SB in Figure = tlw] [x =r1+1] wA [e =r2 +1] tofu] 
Ts coincides with rT. 
Conversely, for a given trace Te = 
(Pm; POg, WRg, WWe, RW.) € Trx(Pa) 


Fig. 4: A trace of the transformed 
LU program (LUg). 
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of a program Pg under a semantics X, we define the trace T = 
(p, PO, WR, WW, RW) where every two components t[r] and t[w] are merged into 
a transaction t € T. The dependency relations are defined in a straightforward 
way, e.g., if (t’[w], t[w]) E WWa then (t’,t) E€ WW. 

The following lemma shows that for any semantics X € {CC, PC, SI}, if 
T € Trx(P) for a program P, then Tẹ is a valid trace of Py under X, i.e., 
Te € Trx(Pa). Intuitively, this lemma shows that splitting transactions in a 
trace and defining dependency relations appropriately cannot introduce cycles 
in these relations and preserves the validity of the different consistency axioms. 

The proof of this lemma relies on constructing a causal order CO and an 
arbitration order ARBy for the trace Tẹ starting from the analogous relations 
in T. In the case of CC, these are the smallest transitive relations such that: 

— POg C COg C ARBg, and 
— if (t1,t2) € CO then (t;[w],ta[r]) € CO, and if (t1,t2) € ARB then 
(t[w], te[r]) € ARBg. 
For PC and SI, COg must additionally satisfy: if (t1,t2) E€ ARB, then 
(t,[w], t2[w]) E€ COg. This is required in order to satisfy the axiom AxPrefix, i.e., 
ARBg;COg C COg, when (t;[w], ta[r]) € ARByg and (ta[r], to[w]) E COg. 

This construction ensures that COg is a partial order and ARB is a total 
order because CO is a partial order and ARB is a total order. Also, based on 
the above rules, we have that: if (ti[w], ta[r]) E€ CO% then (ti,t2) € CO, and 
similarly, if (tı [w], ta[r]) € ARBy then (t1,t2) € ARB. 


Lemma 1. If 7 € Trx(P), then Tg € Trx(Pa). 


Before presenting a strengthening of Lemma |1| when X is CC, we give an 
important characterization of CC traces. This characterization is stated in terms 
of acyclicity properties. 


Lemma 2. 7 is a trace under CC iff ARB and COJ; RW are acyclic (ARBo 
and COo are defined in Table h. 


Next we show that a trace 7 of a program P is CC iff the corresponding trace 
T& Of Pa is CC as well. This result is based on the observation that cycles in 
ARB or coy ; RW cannot be broken by splitting transactions. 


Lemma 3. A trace T of P is CC iff the corresponding trace Tg of Pa is CC. 


The following lemma shows that a trace 7 is PC iff the corresponding trace Tg 
is SER. The if direction in the proof is based on constructing a causal order CO 
and an arbitration order ARB for the trace 7 from the arbitration order ARBg 
in Tg (since Tẹ is a trace under serializability CO and ARB, coincide). These 
are the smallest transitive relations such that: 

— if (t1[w], ta[r]) € ARB& then (ti, t2) € CO, 

— if (tıfw], to[w]) € ARBg then (tı, t2) € ARH 

6 If ti [w] is empty (tı is read-only), then we set (t1, t2) € ARB if (ti[r], t2[w]) € COg. 


If t2[w] is empty, then (t1, t2) € ARB if (t1[w], t2[r]) E€ CO4. If both ti[w] and te[w] 
are empty, then (t1,t2) E€ ARB if (t1[r], to[r]) E COg. 
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The only-if direction is based on the fact that any cycle in the dependency 
relations of 7 that is admitted under PC (characterized in Lemma [7) is “broken” 
by splitting transactions. Also, splitting transactions cannot introduce new cycles 
that do not originate in T. 


Lemma 4. A trace T is PC iff Tg is SER 


The lemmas above are used to prove Theorem [I] as follows: 


PROOF of Theorem|]} For the if direction, assume by contradiction that P is not 
robust against CC relative to PC. Then, there must exist a trace tT € Trec(P) \ 
Trpc(P). Lemmas|3]and|[4]imply that the corresponding trace Tg, of Pa is CC and 
not SER. Thus, Py is not robust against CC relative to SER. The only-if direction 
is proved similarly. 


Robustness against CC relative to SER has been shown to be reducible in 
polynomial time to the reachability problem under SER (10}. Given a program P 
and a control location £, the reachability problem under SER asks whether there 
exists an execution of P under SER that reaches @. Therefore, as a corollary of 
Theorem [1] we obtain the following: 


Corollary 1. Checking robustness against CC relative to PC is reducible to the 
reachability problem under SER in polynomial time. 


In the following we discuss the complexity of this problem in the case of finite- 
state programs (bounded data domain). The upper bound follows from Corol- 
lary |1| and standard results about the complexity of the reachability problem 
under sequential consistency, which extend to SER, with a bounded or para- 
metric number of processes [45]. For the lower bound, given an instance (P, £) 
of the reachability problem under sequential consistency, we construct a pro- 
gram P’ where each statement s of P is executed in a different transaction that 
guardd"| the execution of s using a global lock (the lock can be implemented in 
our programming language as usual, e.g., using a busy wait loop for locking), 
and where reaching the location £ enables the execution of a “gadget” that corre- 
sponds to the SB program in Figure Bal Executing each statement under a global 
lock ensures that every execution of P’ under CC is serializable, and faithfully 
represents an execution of P under sequential consistency. Moreover, P reaches 
£ iff P’ contains a robustness violation, which is due to the SB execution. 

Corollary 2. Checking robustness of a program with a fixed number of variables 
and bounded data domain against CC relative to PC is PSPACE-complete when 
the number of processes is bounded and EXPSPACE-complete, otherwise. 


5 Robustness Against PC Relative to SI 


In this section, we show that checking robustness against PC relative to SI can 
be reduced in polynomial time to a reachability problem under the SER seman- 
tics. We reuse the program transformation from the previous section that allows 
to simulate PC behaviors on top of SER, and additionally, we provide a char- 
acterization of traces that distinguish the PC semantics from SI. We use this 


T That is, the transaction is of the form [lock; s; unlock] 
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characterization to define an instrumentation (monitor) that is able to detect if 
a program under PC admits such traces. 

We show that the happens-before cycles in a robustness violation (against PC 
relative to SI) must contain a WW dependency followed by a RW dependency, 
and they should not contain two successive RW dependencies. This follows from 
the fact that every happens-before cycle in a PC trace must contain either two suc- 
cessive RW dependencies, or a WW dependency followed by a RW dependency. 
Otherwise, the happens-before cycle would imply a cycle in the arbitration order. 
Then, any trace under PC where all its simple happens-before cycles contain two 
successive RW dependencies is possible under SI. For instance, the trace of the 
non-robust LU execution in Figure contains WW dependency followed by a 
RW dependency and does not contain two successive RW dependencies which is 
disallowed SI, while the trace of the robust WS execution in Figure [Bc] contains 
two successive RW dependencies. As a first step, we prove the following theorem 
characterizing traces that are allowed under both PC and SI. 


Theorem 2. A program P is robust against PC relative to SI iff every happens- 
before cycle in a trace of P under PC contains two successive RW dependencies. 


Before giving the proof of the above theorem, we state several intermediate 
results that characterize cycles in PC or SI traces. First, we show that every PC 
trace in which all simple happens-before cycles contain two successive RW is also 
a SI trace. 


Lemma 5. If a trace T is PC and all happens-before cycles in T contain two 
successive RW dependencies, then T is SI. 


The proof of Theorem [2]also relies on the following lemma that characterizes 
happens-before cycles permissible under SI. 


Lemma 6. If a trace T is SI, then all its happens-before cycles must 
contain two successive RW dependencies. 


PROOF of Theorem [2] For the only-if direction, if P is robust against PC relative 
to SI then every trace T of P under PC is SI as well. Therefore, by Lemma [6] all 
cycles in 7 contain two successive RW which concludes the proof of this direction. 
For the reverse, let 7 be a trace of P under PC such that all its happens-before 
cycles contain two successive RW. Then, by Lemma [5] we have that 7 is SI. 
Thus, every trace T of P under PC is SI. 


Next, we present an important lemma that characterizes happens before cy- 
cles possible under the PC semantics. This is a strengthening of a result in 
which shows that all happens before cycles under PC must have two successive de- 
pendencies in {RW, WW} and at least one RW. We show that the two successive 
dependencies cannot be RW followed WW, or two successive WW. 


Lemma 7. If a trace T is PC then all happens-before cycles in T must contain 
either two successive RW dependencies or a WW dependency followed by a RW 
dependency. 

Combining the results of Theorem|2|and Lemmas|4Jand[7] we obtain the following 
characterization of traces which violate robustness against PC relative to SI. 
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Theorem 3. A program P is not robust against PC relative to SI iff there exists 
a trace Ty of Pa under SER such that the trace T obtained by merging] read and 
write transactions in Tg contains a happens-before cycle that does not contain 
two successive RW dependencies, and it contains a WW dependency followed by 
a RW dependency. 


The results above enable a reduction from checking robustness against PC relative 
to SI to a reachability problem under the SER semantics. For a program P, we 
define an instrumentation denoted by [P], such that P is not robust against 
PC relative to SI iff [P] violates an assertion under SER. The instrumentation 
consists in rewriting every transaction of P as shown in Figure [6] 

The instrumentation [P] running RW HB 
under SER simulates the PC semantics a t B >t T t 


of P using the same idea of decou- x’ w 


pling the execution of the read part 
of a transaction from the write part. Fig. 5: Execution simulating a violation 


It violates an assertion when it simu- to robustness against PC relative to SI. 
lates a PC trace containing a happens- 


before cycle as in Theorem [5] The execution corresponding to this trace has the 
shape given in Figure |5| where ty is the transaction that occurs between the 
WW and the RW dependencies, and every transaction executed after ty (this 
can be a full transaction in P, or only the read or write part of a transaction in 
P) is related by a happens-before path to ty (otherwise, the execution of this 
transaction can be reordered to occur before t4). A transaction in P can have 
its read part included in a and the write part included in 8 or y. Also, 6 and y 
may contain transactions in P that executed only their read part. It is possible 
that to = t, B = y = €, and a = e (the LU program shown in Figure [3b] is an 
example where this can happen). The instrumentation uses auxiliary variables 
to track happens-before dependencies, which are explained below. 

The instrumentation executes (incomplete) transactions without affecting 
the auxiliary variables (without tracking happens-before dependencies) (lines 
and |5) until a non-deterministically chosen point in time when it declares the 
current transaction as the candidate for ty (line [9). Only one candidate for ty 
can be chosen during the execution. This transaction executes only its reads and 
it chooses non-deterministically a variable that it could write as a witness for 
the WW dependency (see lines [16}22}. The name of this variable is stored in 
a global variable varW (see the definition of Zy( x := e )). The writes are not 
applied on the shared memory. Intuitively, t4 should be thought as a transaction 
whose writes are delayed for later, after transaction t in Figure [5]executed. The 
instrumentation checks that t and t can be connected by some happens-before 
path that includes the RW and WW dependencies, and that does not contain 
two consecutive RW dependencies. If it is the case, it violates an assertion at the 
commit point of t. Since the write part of ty is intuitively delayed to execute 
after t, the process executing ty is disabled all along the execution (see the 
assume false). 


8 This transformation has been defined at the beginning of Section [4] 
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Transaction “begin (read)* (test)* (write)* commit” is rewritten to: 


1 if ( !doney ) Tu(r:=x): 

2 if ( *) 

3 begin <read>* <test>* commit 16 r := x; 

4 if ( !doney ) 17 bbR[’x’] := 0; 

5 begin <write>* commit 18 rdSet := rdSet U { ’x’ }; 

6 else 

7 begin) (Z(<write>))* Z(commit) Ty(x:=e): 

8 else , R 

9 begin (Zy(<read>))* <test>* (Zy(<write>))* Ty (commit) 19 if ( varW == L and * ) 
z 20 varW := ?’x?’; 

10 assume false; 

11 else if ( * ) g 

12 rdSet’ := 0; Tz( commit ): 

13. eet ee 21 assume ( varW != Ll ) 


14 T(begin) (L(<read>))* <test>* ZT (commit) 


15 Z(begin) (Z(<write>))* Z(commit) ai a 


T( begin ): 
I(r:=x): 
23 begin 
24 hb := L 42 r := x; 
25 if ( hbP != | and hbP < 2 ) 43 rdSet’ := rdSet’ U { ’x’ }; 
26 hb := 0; 44 if ( °x? E wrSet ) 
27 else if ( hbP = 2 ) 45 if ( hbW[’x’] != 2 ) 
28 hb := 2; 46 hb := 0 
47 else if ( hb == 1 ) 
Z( commit ): 48 hb := hbW[’x’] 
29 assume ( hb != 1 ) T(x:=e): 
30 assert ( hb == 2 or varW ¢ wrSet’ ); 
31 if ( hbP == or hbP > hb ) 49 x := e; 
32 hbP = hb; 50 wrSet’ := wrSet’ U { °x? }; 
33 for each ’x’ € wrSet’ 51 if ( °x? C wrSet ) 
34 if ( hbW[’x’?] == L or hbW[’x’] > hb ) 52 if ( hbW[’x’] != 2 ) 
35 hbW[’?x?] = hb; 53 hb := 0 
36 for each ’x’ € rdSet’ 54 else if ( hb == 1 ) 
37 if ( hbR[’x’] == L or hbR[’x’] > hb ) 55 hb := hbW[’x’] 
38 hbR[’?x?] = hb; 56 if ( °x? E rdSet ) 
39 rdSet := rdSet U rdSet’; 57 if ( hb = L or hb > hbR[’x’] + 1 ) 
40 wrSet := wrSet U wrSet’; 58 hb := min(hbR[’x’] + 1,2) 


41 commit 


Fig.6: A program instrumentation for checking robustness against PC relative 
to SI. The auxiliary variables used by the instrumentation are shared variables, 
except for hbP, rdSet’, and wrSet’, which are process-local variables, and they 
are initially set to L. This instrumentation uses program constructs which can 
be defined as syntactic sugar from the syntax presented in Section |3| e.g., if 
then-else statements (outside transactions). 


After choosing the candidate for ty, the instrumentation uses the auxiliary 
variables for tracking happens-before dependencies. Therefore, rdSet and wrSet 
record variables read and written, respectively, by transactions that are con- 
nected by a happens-before path to ty (in a trace of P). This is ensured by 
the assume at line [29] During the execution, the variables read or written by a 
transactiox)| that writes a variable in rdSet (see line [56), or reads or writes a 
variable in wrSet (see lines [44] and [51), will be added to these sets (see lines [39] 


° These are stored in the local variables rdSet’ and wrSet’ while the transaction is 
running. 
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and [40}. Since the variables that t writes in P are not recorded in wrSet, these 
happens-before paths must necessarily start with a RW dependency (from ty). 
When the assertion fails (line [30), the condition varW € wrSet’ ensures that the 
current transaction has a WW dependency towards the write part of ty (the 
current transaction plays the role of t in Figure 5). 

The rest of the instrumentation checks that there exists a happens-before 
path from ty to t that does not include two consecutive RW dependencies, 
called a SI, path. This check is based on the auxiliary variables whose name is 
prefixed by hb and which take values in the domain {1,0,1,2} (1 represents 
the initial value). Therefore, 


— hbR[’x’] (resp., hbW[’x’]) is 0 iff there exists a transaction t that reads 
x (resp., writes to x), such that there exists a SI, path from ty to ¢’ that 
ends with a dependency which is not RW, 

— hbR[’x’] (resp., hbW[’x’]) is 1 iff there exists a transaction t that reads 
x (resp., writes to x) that is connected to ty by a SI- path, and every SI. 
path from ty to a transaction t” that reads x (resp., writes to x) ends with 
an RW dependency, 

— hbR[’x’] (resp., hbW[’x’]) is 2 iff there exists no SI path from ty to a 
transaction t’ that reads x (resp., writes to x). 


The local variable hbP has the same interpretation, except that t’ and t” are in- 
stantiated over transactions in the same process (that already executed) instead 
of transactions that read or write a certain variable. Similarly, the variable hb 
is a particular case where t and t” are instantiated to the current transaction. 
The violation of the assertion at line [30] implies that hb is 0 or 1, which means 
that there exists a SI, path from ty to t. 

During each transaction that executes after t4, the variable hb characterizing 
happens-before paths that end in this transaction is updated every time a new 
happens-before dependency is witnessed (using the values of the other variables). 
For instance, when witnessing a WR dependency (line [44), if there exists a SI_, 
path to a transaction that writes to x, then the path that continues with the 
WR dependency towards the current transaction is also a SI. path, and the 
last dependency of this path is not RW. Therefore, hb is set to 0 (see line (46). 
Otherwise, if every path to a transaction that writes to x is not a SI_ path, 
then every path that continues to the current transaction (by taking the WR 
dependency) remains a non SI- path, and hb is set to the value of hbW[‘x‘], 
which is 2 in this case (see line [48). Before ending a transaction, the value of hb 
can be used to modify the hbR, hbW, and hbP variables, but only if those variables 
contain bigger values (see lines [31}{38). 

The correctness of the instrumentation is stated in the following theorem. 


Theorem 4. A program P is robust against PC relative to SI iff the instrumen- 
tation in Figure[6| does not violate an assertion when executed under SER. 


Theorem [4] implies the following complexity result for finite-state programs. 
The lower bound is proved similarly to the case CC vs PC. 
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Corollary 3. Checking robustness of a program with a fixed number of variables 
and bounded data domain against PC relative to SI is PSPACE-complete when 
the number of processes is bounded and EXPSPACE-complete, otherwise. 


Checking robustness against CC relative to SI can be also shown to be re- 
ducible (in polynomial time) to a reachability problem under SER by combining 
the results of checking robustness against CC relative to PC and PC relative to SI. 


Theorem 5. A program P is robust against CC relative to SI iff P is robust 
against CC relative to PC and P is robust against PC relative to SI. 


Remark 1. Our reductions of robustness checking to reachability apply to an 
extension of our programming language where the number of processes is un- 
bounded and each process can execute an arbitrary number of times a statically 
known set of transactions. This holds because the instrumentation in Figure [6] 
and the one in (for the case CC vs. SER) consist in adding a set of instruc- 
tions that manipulate a fixed set of process-local or shared variables, which do 
not store process or transaction identifiers. These reductions extend also to SQL 
queries that access unbounded size tables. Rows in a table can be interpreted 
as memory locations (identified by primary keys in unbounded domains, e.g., 
integers), and SQL queries can be interpreted as instructions that read/write 
a set of locations in one shot. These possibly unbounded sets of locations can 
be represented symbolically using the conditions in the SQL queries (e.g., the 
condition in the WHERE part of a SELECT). The instrumentation in Figure 6 
needs to be adapted so that read and write sets are updated by adding sets of 
locations for a given instruction (represented symbolically as mentioned above). 


6 Proving Robustness Using Commutativity Dependency 
Graphs 


We describe an approximated technique for proving robustness, which leverages 
the concept of left/right mover in Lipton’s reduction theory [89]. This technique 
reasons on the commutativity dependency graph p] associated to the transforma- 
tion Py of an input program P that allows to simulate the PC semantics under 
serializability (we use a slight variation of the original definition of this class of 
graphs). We characterize robustness against CC relative to PC and PC relative to 
SI in terms of certain properties that (simple) cycles in this graph must satisfy. 

We recall the concept of movers and the definition of commutativity de- 
pendency graphs. Given a program P and a trace T = tı- ...- tn E€ Treer(P) 
of P under serializability, we say that ti € Tr moves right (resp., left) in T if 
ti teret hiai hi ety tpg eas in (resp., ty + Jeet bg ti: ti—1 tiga eat tn) 
is also a valid execution of P, t; and tj41 (resp., t;-1) are executed by distinct 
processes, and both traces reach the same end state. A transaction t € Tr(P) is 
not a right (resp., left) mover iff there exists a trace Tr € Trser(P) such that t € T 
and t doesn’t move right (resp., left) in 7. Thus, when a transaction t is not a 
right mover then there must exist another transaction t € 7 which caused t to 
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not be permutable to the right (while preserving the end state). Since t and t 
do not commute, then this must be because of either a write-read, write-write, 
or a read-write dependency relation between the two transactions. We say that 
t is not a right mover because of t and a dependency relation that is either 
write-read, write-write, or read-write. Notice that when t is not a right mover 
because of t’ then t is not a left mover because of t. 

We define Mwr as a binary relation between transactions such that (t,t) € 
Mwr when t is not a right mover because of t’ and a write-read dependency (t 
reads some value written by t). We define the relations Mww and Mew corre- 
sponding to write-write and read-write dependencies in a similar way. We call 
Mwr, Mww, and Mrw, non-mover relations. 

The commutativity dependency graph of a program P is a graph where ver- 
tices represent transactions in P. Two vertices are linked by a program order 
edge if the two transactions are executed by the same process. The other edges 
in this graph represent the “non-mover” relations Mwr, Mww, and Mraw. Two 
vertices that represent the two components t[w] and t[r] of the same transaction 
t (already linked by PO edge) are also linked by an undirected edge labeled by 
STO (same-transaction relation). 


Our results about the robust- tl[w] [x = 1] [r1 = y] t3[r] 
ness of a program P are stated over 
a slight variation of the commu- o| R fro 
tativity dependency graph of Pg 


(where a transaction is either read- t2lw] [y = 1] [r2 = x] t4jr] 
only or write-only). This graph con- 
tains additional undirected edges Fig.7: The commutativity dependency 
that link every pair of transactions graph of the MP program. 
t[r] and t[w] of Py that were origi- 
nally components of the same transaction t in P. Given such a commutativity 
dependency graph, the robustness of P is implied by the absence of cycles of spe- 
cific shapes. These cycles can be seen as an abstraction of potential robustness 
violations for the respective semantics (see Theorem |6]and T heorem|7). Figure[?| 
pictures the commutativity dependency graph for the MP program. Since every 
transaction in MP is singleton, the two programs MP and MP g coincide. 

Using the characterization of robustness violations against CC relative to SER 
from and the reduction in Theorem |1| we obtain the following result con- 
cerning the robustness against CC relative to PC. 


Theorem 6. Given a program P, if the commutativity dependency graph of the 
program Py does not contain a simple cycle formed by tı +--+ ti +++ tn such that: 
E (tn, t1) € Mew; 
= (tj, tj+1) € (PO U WR)*, forj € [1,i— 1j; 
— (ti, ti+1) € (Mrw U Mww); 
= (tj, tj+1) E€ (Maw U Mww U Mwr U PO), forje [i +1jn- 1]. 


then P is robust against CC relative to PC. 


Next we give the characterization of commutativity dependency graphs required 
for proving robustness against PC relative to SI. 
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Theorem 7. Given a program P, if the commutativity dependency graph of the 
program Py does not contain a simple cycle formed by tı -++ tn such that: 
= (tn, t1) € Mww, (ti, t2) E STO, and (ta, ts) € Mew; 
— (tj, t341) E€ (Mrw U Mww U Mwr U PO U STO)*, forje [3,n = 1]; 
— V j E [2,n-— 2]. 
e if (tj, tj+1) E€ Mrw then (tj41,tj+2) = (MwrR UPOU Mww); 
e if (tj+1, tj+2) E€ Mrw then (t;, tj+ We (Mwre U PO). 
—-VjE [3,n — 3]. if (t j+1; tj+2) € STO and (t j+25 tj+3) E Mew then 
(tj,tj41) E€ Mww. 
then P is robust against PC relative to SI. 


In Figure [7] we have three simple cycles in the graph: 


~ (ti[w], t4[r]) E€ Mur and (¢4[r], ¢1[w]) € Mew, 
— (t 2[w lst 3[r]) € Mwpr and (t3[r], t2[w]) € Mew, 
— (tl[w], t2[w]) € PO, (t2[w], t3[r]) € Mwr, (t3[r], t4[r]) € PO, and 
(t4[r], t1[w]) € Mrw. 
Notice that none of the cycles satisfies the properties in Theorems [6] and 
Therefore, MP is robust against CC relative to PC and against PC relative to SI. 


Remark 2. For programs that contain an unbounded number of processes, an 
unbounded number of instantiations of a fixed number of process “templates”, 
or unbounded loops with bodies that contain entire transactions, a sound ro- 
bustness check consists in applying Theorem [6] and Theorem [7] to (bounded) 
programs that contain two copies of each process template, and where each 
loop is unfolded exactly two times. This holds because the mover relations are 
“static”, they do not depend on the context in which the transactions execute, 
and each cycle requiring more than two process instances or more than two loop 
iterations can be short-circuited to a cycle that exists also in the bounded pro- 
gram. Every outgoing edge from a third instance/iteration can also be taken 
from the second instance/iteration. Two copies/iterations are necessary in or- 
der to discover cycles between instances of the same transaction (the cycles in 
Theorem [6] and Theorem [7] are simple and cannot contain the same transaction 
twice). These results extend easily to SQL queries as well because the notion of 
mover is independent of particular classes of programs or instructions. 


7 Experimental Evaluation 


We evaluated our approach for checking robustness on 7 applications extracted 
from the literature on databases and distributed systems, and an application 
Betting designed by ourselves. Two applications were extracted from the OLTP- 
Bench benchmark |30|: a vote recording application (Vote) and a consumer 
review application (Epinions). Three applications were obtained from Github 
projects (used also in (9| [19]): a distributed lock application for the Cassan- 
dra database (CassandraLock (24)), an application for recording trade activities 
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(SimpleCurrencyExchange (48), and a micro social media application (Twit- 
ter (49]). The last two applications are a movie ticketing application (Fusion- 
Ticket) [84], and a user subscription application inspired by the Twitter appli- 
cation (Subscription). Each application consists of a set of SQL transactions that 
can be called an arbitrary number of times from an arbitrary number of pro- 
cesses. For instance, Subscription provides an AddUser transaction for adding a 
new user with a given username and password, and a RemoveUser transaction 
for removing an existing user. (The examples in Figure |1] are particular varia- 
tions of FusionTicket, Twitter, and Betting.) We considered five variations of 
the robustness problem: the three robustness problems we studied in this paper 
along with robustness against SI relative to SER and against CC relative to SER. 
The artifacts are available in a GitHub repository (31). 


Table 2: Results of the experiments. The columns titled X-Y stand for the result 
of applications robustness against X relative to Y. 


Application Transactions Robustness 
CC-PC|PC-SI|CC-SI/SI-SER|CC-SER 

Betting 2 yes | yes | yes | yes yes 
CassandraLock 3 yes | yes | yes | yes yes 
Epinions 8 no | yes | no yes no 
FusionTicket 3 no | no | no yes no 
SimpleCurrencyExchange 4 yes | yes | yes | yes yes 
Subscription 2 yes | no | no yes no 
Twitter 3 no | no | no yes no 
Vote 1 yes | yes | yes no no 


In the first part of the experiments, we check for robustness violations in 
bounded-size executions of a given application. For each application, we have 
constructed a client program with a fixed number of processes (2) and a fixed 
number of transactions of the corresponding application (at most 2 transactions 
per process). For each program and pair of consistency models, we check for 
robustness violations using the reductions to reachability under SER presented 
in Section [4] and Section [5]in the case of pairs of weak consistency models, and 
the reductions in g when checking for robustness relative to SER. 

We check for reachability (assertion violations) using the Boogie program 
verifier [8]. We model tables as unbounded maps in Boogie and SQL queries as 
first-order formulas over these maps (that may contain existential or universal 
quantifiers). To model the uniqueness of primary keys we use Boogie linear types. 

Teh a the results of this experiment (cells filled with “no” Five 
applications are not robust against at least one of the semantics relative to some 
other stronger semantics. The runtimes (wall-clock times) for the robustness 
checks are all under one second, and the memory consumption is around 50 
Megabytes. Concerning scalability, the reductions to reachability presented in 
Section [4] and Section [5] show that checking robustness is as hard as checking 


10 The Twitter client in Table [2] which is not PC vs CC robust, is different from the one 
described in Section [2] This client program consists of two processes, each executing 
FollowUser and AddTweet. 
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reachability (the size of the instrumented program is only linear in the size of 
the original program). Therefore, checking robustness will also suffer from the 
classic state explosion problem when increasing the number of processes. On the 
other hand, increasing the number of transactions in a process does not seem to 
introduce a large overhead. Increasing the number of transactions per process in 
the clients of Epinions, FusionTicket, and SimpleCurrencyExchange from 2 to 5 
introduces a running time overhead of at most 25%. 


All the robustness violations we report correspond to violations of the in- 
tended specifications. For instance: (1) the robustness violation of Epinions 
against CC relative to PC allows two users to update their ratings for a given 
product and then when each user queries the overall rating of this product they 
do not observe the latest rating that was given by the other user, (2) the ro- 
bustness violation of Subscription against PC relative to SI allows two users to 
register new accounts with the same identifier, and (3) the robustness violation 
of Vote against SI relative to SER allows the same user to vote twice. The spec- 
ification violation in Twitter was reported in [19]. However, it was reported as 
violation of a different robustness property (CC relative to SER) while our work 
shows that the violation persists when replacing a weak consistency model (e.g., 
SI) with a weaker one (e.g. CC). This implies that this specification violation 
is not present under SI (since it appears in the difference between CC and SI 
behaviors), which cannot be deduced from previous work. 


In the second part of the experiments, we used the technique described in 
Section [6 based on commutativity dependency graphs, to prove robustness. For 
each application (set of transactions) we considered a program that for each 
ordered pair of (possibly identical) transactions in the application, contains two 
processes executing that pair of transactions. Following Remark[2] the robustness 
of such a program implies the robustness of a most general client of the appli- 
cation that executes each transaction an arbitrary number of times and from an 
arbitrary number of processes. We focused on the cases where we could not find 
robustness violations in the first part. To build the “non-mover” relations Mwr, 
Mww, and Mew for the commutativity dependency graph, we use the left/right 
mover check provided by the CIVL verifier [83]. The results are reported in Ta- 
ble [2] the cells filled with “yes”. We showed that the three applications Betting, 
CassandraLock and SimpleCurrencyExchange are robust against any semantics 
relative to some other stronger semantics. As mentioned earlier, all these ro- 
bustness results are established for arbitrarily large executions and clients with 
an arbitrary number of processes. For instance, the robustness of SimpleCur- 
rencyExchange ensures that when the exchange market owner observes a trade 
registered by a user, they observe also all the other trades that were done by 
this user in the past. 


In conclusion, our experiments show that the robustness checking techniques 
we present are effective in proving or disproving robustness of concrete applica- 
tions. Moreover, it shows that the robustness property for different combinations 
of consistency models is a relevant design principle, that can help in choosing 
the right consistency model for realistic applications, i.e., navigating the trade- 
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off between consistency and performance (in general, weakening the consistency 
leads to better performance). 


8 Related Work 


The a eT i in this paper were studied in several recent works |2 
[20} [25] [43} [16} {44 . Most of them focused on their operational and P OT 
formalizations. te formal definitions we use in this paper are based on those 
given in (25} [16]. Biswas and Enea shows that checking whether an execution 
is CC is polynomial time while checking whether it is PC or SI is NP-complete. 

The robustness problem we study in this paper has been investigated in the 
context of weak memory models, but only relative to sequential consistency, 
against Release/Aquire (RA), TSO and Power [29]. Checking ro- 
bustness against CC and SI relative to SER has been investigated in p 10). 
In this work, we study the robustness problem between two weak consistency 
models, which poses different non-trivial challenges. In particular, previous work 
proposed reductions to reachability under sequential consistency (or SER) that 
relied on a concept of minimal robustness violations (w.r.t. an operational se- 
mantics), which does not apply in our case. The relationship between PC and 
SER is similar in spirit to the one given by Biswas and Enea in the context 
of checking whether an execution is PC. However, that relationship was proven 
in the context of a “weaker” notion of trace (containing only program order and 
read-from), and it does not extend to our notion of trace. For instance, that 
result does not imply preserving WW dependencies which is crucial in our case. 

Some works describe various over- or under-approximate analyses for check- 
ing robustness relative to SER. The works in propose static 
analysis techniques based on computing an abstraction of the set of computa- 
tions, which is used for proving robustness. In particular, encode program 
executions under the weak consistency model using FOL formulas to describe 
the dependency relations between actions in the executions. These approaches 
may return false alarms due to the abstractions they consider in their encoding. 
Note that in this paper, we prove a strengthening of the results of with 
regard to the shape of happens before cycles allowed under PC. 

An alternative to trace-based robustness, is state-based robustness which re- 
quires that a program is robust if the sets of reachable states under two semantics 
coincide. While state-robustness is the necessary and sufficient concept for pre- 
serving state-invariants, its verification, which amounts in computing the set of 
reachable states under the weak semantics models is in general a hard problem. 
The decidability and the complexity of this problem has been investigated in 
the context of relaxed memory models such as TSO and Power, and it has been 
shown that it is either decidable but highly complex (non-primitive recursive), or 
undecidable (6). Automatic procedures for approximate reachability /invariant 
checking have been proposed using either abstractions or bounded analyses, e.g., 
[7 p [28] [i]. Proof methods have also been ae ne Bas invariants in 
the context of weakly consistent models such as 41] [8]. These methods, 
however, do not provide decision procedures. 
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Abstract. Modularity - the partitioning of software into units of func- 
tionality that interact with each other via interfaces - has been the main- 
stay of software development for half a century. In case of the C language, 
the main mechanism for modularity is the compilation unit / header file 
abstraction. This paper complements programmatic modularity for C 
with modularity idioms for specification and verification in the context 
of Verifiable C, an expressive separation logic for CompCert Clight. Tech- 
nical innovations include (i) abstract predicate declarations — existential 
packages that combine Parkinson & Bierman’s abstract predicates with 
their client-visible reasoning principles; (ii) residual predicates, which 
help enforcing data abstraction in callback-rich code; and (iii) an appli- 
cation to pure (Smalltalk-style) objects that connects code verification 
to model-level reasoning about features such as subtyping, self, inheri- 
tance, and late binding. We introduce our techniques using concrete ex- 
ample modules that have all been verified using the Coq proof assistant 
and combine to fully linked verified programs using a novel, abstraction- 
respecting component composition rule for Verifiable C. 


Keywords: Verified Software Unit - Abstract Predicate Declaration - 
Residual Predicate - Positive Subtyping - Verified Software Toolchain. 


1 Introduction 


Separation logic [61,53] constitutes a powerful framework for verifying functional 
correctness of imperative programs. Foundational implementations in interactive 
proof assistants such as Coq exploit the expressiveness of modern type theory 
to construct semantic models that feature higher-order impredicative quantifica- 
tion, step-indexing, and advanced notions of ghost state [4,36]. On the basis of 
proof rules that are justified w.r.t. the operational semantics of the programming 
language in question, these systems perform symbolic execution and employ mul- 
tiple layers of tactical or computational proof automation to assist the engineer 
in the construction of concrete verification scripts. Perhaps most importantly, 
these implementations integrate software verification and model-level validation, 
by embedding assertions shallowly in the proof assistant’s ambient logic; this 
permits specifications to refer to executable model programs or domain-specific 
constructions that are then amenable to code-independent analysis in Coq. 

To realize the potential of separation logic, such implementations must be 
provided for mainstream languages and compatible with modern software engi- 
neering principles and programming styles. This paper addresses this challenge 
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for Verifiable C, the program logic of the Verified Software Toolchain (VST [4]). 
We advance Verifiable C’s methodology as follows. 


1. We provide general infrastructure for modular verification of modular pro- 
grams by extending Beringer and Appel’s recent theory of function specifi- 
cation subsumption and intersection specifications [15] to a formal calculus 
for composing verified software units (VSUs) at their specification interface. 
Each VSU equips a compilation unit’s header file with VST specifications of 
its API-exposed functions. Composition of VSUs matches the respective im- 
port and export interfaces, applying subsumption as necessary. Crucially, a 
compilation unit’s private functions remain hidden and only need to be spec- 
ified locally. Composition is compatible with source-level linking for Comp- 
Cert Clight and supports repeated import of library modules (§3). 

2. Utilizing existential abstraction [46] and parametricity, we extend work on 
abstract predicates [56] to provide clients with specification interfaces that 
differ in the degree to which module-internal representation details are re- 
vealed. This flexibility is achieved by codifying how the reasoning principles 
associated with a predicate can be selectively communicated to clients, using 
a device we call (ezistentially) abstract predicate declarations (APDs) (§4). 

3. To investigate specification modularity in the presence of callbacks, we study 
variants of the subject-observer design pattern; we demonstrate that by com- 
plementing a module’s primary predicate with residual predicates, represen- 
tation hiding can be respected even at transient interaction points, where 
an invocation of a module’s operation is interrupted, the module’s invari- 
ant may be violated, and yet its internal state must remain unmodified and 
unexposed until the operation is resumed (§5). 

4. We present a novel approach to foundational reasoning about object prin- 
ciples that modularly separates C code verification from model-level behav- 
ior. Exploiting the theory of positive subtyping [30], we cover subtyping, 
interfaces with multiple implementations, dynamic dispatch, self, and late 
binding, for a simple Smalltalk-style object model with static typing (§6). 


This paper is accompanied by a development in Coq [14] that conservatively 
extends VST with the VSU infrastructure and contains several case studies. In 
addition to the examples detailed in the paper, the Coq code treats (i) the run- 
ning example (“piles”) of Beringer and Appel’s development [15]; we retain their 
ability to substitute representation-altering but specification-preserving imple- 
mentations; (ii) a variant of Barnett and Naumann’s Master-Clock example [12], 
as another example of tightly coupled program units; and (iii) an implementa- 
tion of the Composite design pattern, obtained by transcribing a development 
from the Verifast code base [35]. In addition, a VSU interface that unifies the 
APIs of B*-trees and tries was recently developed by Kravchuk-Kirilyuk [40]. 

To see how APDs build on Parkinson and Bierman’s work, consider a concrete 
representation predicate in the style of Reynolds [61]: list x a p specifies that 
address p represents a monotone list œ of numbers greater than x: 


list x nil p = (p=null) & emp list x (a::a) p %1 J]q a>x& pr a, qx listaa q. 
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Being defined in terms of ++, this definition assumes a specific data layout (a two- 
field struct). Representation-specific predicates enable verification of concrete 
implementations of operations such as reverse. But a client-facing specification 
of the entire list module should only expose the predicate in its folded form — 
a simple case of an abstract predicate. Indeed, while VST fully supports API 
exposure of structs (incl. stack allocation), all examples in this paper employ an 
essentially “dataless” programming discipline [8,60,37] in which structs are at 
most exposed as forward declarations. Clearly, such programmatic encapsulation 
should not be compromised through the use of concrete predicate definitions. 
To regulate whether a predicate is available in its abstract or unfolded form 
at a particular program point, Parkinson and Bierman employ a notion of scope: 
predicates are available in their unfolded form when in scope and are treated 
symbolically elsewhere. This separation can naturally align with the partitioning 
into compilation units, but is all-or-nothing. But even in the absence of specifica- 
tions, different clients need different interfaces: C developments routinely provide 
multiple header files for a single code unit, differing in the amount to which rep- 
resentational information is exposed. Mundane examples include special-purpose 
interfaces for internal performance monitoring or debugging. Extending this ob- 
servation to specifications means supporting multiple public invariants. Indeed, 
several levels of visibility are already conceivable for our simple list predicate: 


— no (un)folding, no exposed reasoning principles: properties that follow from 
the predicate’s definition cannot be exploited during client-side verification; 

— no (un)folding, but reasoning principles are selectively exposed; for example, 
one may expose the model-level property that a is strictly increasing, or the 
fact that the head pointer is null exactly if œ is empty; 

— the set of exposed reasoning principles includes fold/unfold lemmas (perhaps 
with the least-fixed-point property inherent in the inductive definition of the 
predicate), but the internal representation of nodes is encapsulated using a 
further predicate; hence, implementations are free to select a different struct 
layout, for example by swapping the order of fields; 

— the predicate definition is fully exposed, including the internal data layout. 


APDs support such flexibility by combining zero or more abstract predicate dec- 
larations (no definitions, to maintain implementation-independence) with ax- 
ioms that selectively expose the predicates’ reasoning principles. In parallel to 
programmatic forward declarations, an APD is exported in the specification in- 
terface of an API and is substantiated — in implementation-dependent fashion 
— in the VST proof of the corresponding compilation unit. This substantiation 
includes the validation of the exposed axioms. When specifying the API of a 
module, the engineer may not only refer to any APDs introduced by the module 
in question, but may also assume APDs for data structures provided by other 
modules (whose header files are typically #included in the API in question). 
Matching the APD assumptions and provisions of different modules occurs nat- 
urally during the application of our component linking rule, ensuring that fully 
linked programs contain no unresolved APD assumptions. 
Before going into technical details, we first summarize key aspects of VST. 
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2 Program verification using VST 


Verification using VST happens exclusively inside the Coq proof environment, 
and operates directly on abstract syntax trees of CompCert Clight. Typically, 
these ASTs result from feeding a C source file through CompCert’s frontend, 
clightgen, but they may also originate from code synthesis. Either way, verifica- 
tion applies to the same code that is then manipulated by CompCert’s optimiza- 
tion and backend phases. This eliminates the assurance gap that emerges when 
a compiler’s (intermediate) representation diverges syntactically or semantically 
from a verification tool’s representation. The absence of such gaps is the gist of 
VST’s machine-checked soundness proof: verified programs are safe w.r.t. the op- 
erational semantics of Clight; this guarantee includes memory safety (absence of 
null-pointer dereferences, out-of-bounds array accesses, use-after-frees,...) but 
also absence of unintended numeric overflows or race conditions. As Clight code 
is still legal C code (although slightly simplified, and with evaluation order de- 
terminized), verification happens at a level the programmer can easily grasp. 

In contrast to other verification tools, VST does not require source code 
to be annotated with specifications. Instead, the verification engineer writes 
specifications in a separate Coq file. By not mixing specifications (let alone 
aspects of proof, such as loop invariants) with source code, VST easily supports 
associating multiple specifications with a function and constructing multiple 
proofs for a given code/specification pair. 

We write function specifications ¢ in the form {P} ~ {v. Q} where v denotes 
the (sometimes existentially quantified) return value and P and Q are separation 
logic assertions. To shield details of its semantic model, VST exposes heap asser- 
tions using the type mpred rather than as direct Coq-level predicates. On top 
of mpred, assertions are essentially embedded shallowly, giving the user access 
to the logical and programmatic features of Coq when defining specifications. 

VST’s top-level notion asserting that a (closed) program p — which must 
include main, with a standard specification — has been verified in Coq is F p : G 
(“semax_prog” ). Here, G — of type funspecs, i.e. associating specifications ¢ 
to function identifiers f — constitutes a witnessing proof context that contains 
specifications for all functions in p and must itself be justified: for each (f, p) € 
G, the user must exhibit a Coq proof of G F f : ¢¢ (“semax_body” ), expressing 
that f satisfies @¢ under hypotheses in Œ. VST’s step-indexed model ensures 
logical consistency in case of (mutual) recursion. 

We exploit Beringer and Appel [15]’s theory of specification subsumption 
@ <: p which extends parameter adaptation [38,50,48] to step-indexed separa- 
tion logics for C and allows a function verified w.r.t @ to be used by clients 
expecting specification w. This theory includes a notion of specification inter- 
section ^A which — similar to, e.g. the also combinator of the Java Modelling 
Language (JML, [19])— allows functions to have multiple specifications. Notice- 
ably, subsumption and intersection are related in formally the same manner as 
intersection types and subtyping are in type theory: in particular, they satisfy 


fieinwed is Hk Goes e ana” ot koe (cf. [58], page 206). 
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3 VSU calculus 


As described above, VST verification amounts to exhibiting a G with F p : G. In 
contrast to VST’s previous linking regime, VSU ensures existence of G during 
component linking without actually constructing G, maintaining representation 
hiding and non-exposure of private functions. Indeed, the modules’ specification 
interfaces (specs of imported and exported functions) suffice for proving that a 
suitable G exists, as long as each module’s individual justification includes the 
verification of its private functions. 


3.1 Components and soundness 


VSU extends CompCert’s distinction between internal functions (those equipped 
locally with a function body) and external functions (functions defined in other 
compilation units, incl. system functions). Given a Clight compilation unit p, we 
denote these (disjoint) sets by IntFuns(p) and ExtFuns(p), respectively. VSU 
further distinguishes between system functions (typically provided by the OS) 
and ordinary external functions: the former ones are not expected to be verified 
using VST even in a fully linked program, so VSU merely records their use. 

VSU’s main judgment is +% [Z] p[E], to be read as using specified imports 
T and system functions S, p provides/ exports functions (with specifications) E, 
using internal memory satisfying (initially) P. The entities S, Z, and € are all 
funspecs, while P specifies the memory holding p’s global variables; P’s formal 
type is globals — mpred where globals refers to a map from global identifiers 
to CompCert values. 

The judgment FẸ [Z] p [E] is formally introduced as an existential abstraction 
(in Coq: a Record type) over a proof context G, which is again of type funspecs: 


def 


Hp [Zp IE] = 3G. G FS [Z]p[é]. 


The role of G is to serve as the witness justifying the specification interface; as 
such it associates specifications also to p’s private functions; existentially hiding 
it shields implementation details. 

The formation of the lower-level judgment G FẸ [Z] p [E] is subject to the 
following constraints: 


Definition 1. Proof context G justifies a component (specification) for Clight 
compilation unit p with respect to system calls S, imports T, exports E€, and 
predicate P, notation G FẸ [Z] p [E], if 


1. domTN domS = and domTU dom S C ExtFuns(p), 

2. dom G = IntFuns(p) U dom S, with G(i) = S(i) whenever i € dom S, 
3. domE C domG, with G(i) <: E(i) for alli € dom€, 

4. TUG Ftune funs, : G, 

5. Vg, InitGPred(Vardefs(p)) g + Pg 
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The first three clauses are largely administrative; they express, respectively, 
that (1) system functions and imported functions are disjoint sets of external 
functions, (2) G contains specifications for exactly the system functions and the 
internal functions, and (3) all exported specifications are abstractions of entries 
in G, in the sense of specification subsumption <:. 

Clause (4) constitutes the main proof obligation and refers to a slight refactor- 
ing of VST’s function-verification judgment Gi Func funs : G2 (semax-func), 
where funs associates CompCert function definitions with identifiers. The instan- 
tiation Z U G Ftune funs, : G hence requires that imports Z suffice for justifying 
all entries in G: each system function specification in G must be valid, and each 
specification of an internal function must be justified by a VST proof the corre- 
sponding function body in funs; calls to internal and system functions inside the 
body are resolved by reference to G, and calls to external functions are resolved 
by the import specifications, Z. 

Finally, clause (5) requires p’s global variables to collectively satisfy P (after 
initialization) but avoids referring to these variables by name. 

We point out two further aspects of Definition 1. First, we note that system 
functions may be exported (we do not require dom S N dom€ = @), and that 
imports and exports are distinct (dom ZN dom E = @ follows). Second, we note 
that for Z = Q, clause (4) yields G rune funs, : G, i.e. the heart of VST’s sound- 
ness condition semax_prog for programs comprised of a single compilation unit. 
Hence, the goal of VSU verification is to exhaustively apply VSU’s combination 
rule (presented in the next subsection) until all imports have been resolved. 

Once a component has been verified and is exposed as FẸ [Z] p [E], the spec- 
ifications of p’s private functions are hidden inside the existentially quantified 
context G and hence inaccessible. 


3.2 Derived rules 


It is easy to derive a rule of consequence from Definition 1 that strengthens 
imports and relaxes exports: 


T'<:T +S [Z] p [E] ECE Yg, Pg P' 


g 
VSUCONSEQ 
HB, [Z] pIE] 


For imported functions, we require pointwise subsumption, by defining Z’ <: T to 
hold if dom T = domT’ and T' (i) <: T(i) for alli € dom T. On the export side, we 
allow hiding of entries, by defining E E E’ to hold if dom E’ C dom E and E(i) <: 
E'(i) for all i € dom E’. The calculus is invariant in the specifications of system 
functions, but allows weakening of the initialization predicate. The derivation 
of this rule instantiates the context witnessing the concluding judgment by the 
(abstract) witness obtained from unfolding the hypothetical judgment. 

VSU’s workhorse is the composition rule, VSULINK, shown in Figure 1. The 
side conditions treat the components symmetrically and are motivated as follows. 
The rule constructs a component specification for a linked program p that retains 
the internal functions of pı and ps, and also any unresolved external functions, as 
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(a) He! (Zijpi E] FË (Za) p2 [E2] 


Vi € IntFuns(pı) U (ExtFuns(p1) \ IntFuns(p2)), p(t) = pı (i) 
(b) Vi € IntFuns(p2) U (ExtFuns(p2) \ IntFuns(p1)), p(t) = p2(i) 
dom p = dom pı U dom p2 


Vi € (IntFuns(p1) N IntFuns(p2)) U (EatFuns(p1) N ExtFuns(p2)), pı (i) = p2 (i) 
(c) Vi € IntFuns(pı) N ExtFuns(p2), sig(pı(i)) = sig(p2(i)) A i € dom T2 
Vi € IntFuns(p2) N ExtFuns(p1), sig(p2(t)) = sig(pi(t)) Ai E€ dom Tı 


(d) dom Sı N IntFuns(p2) = 0 dom S2 N IntFuns(pi) = 0 

(e) vi € dom T2 N (dom Sı U IntFuns(p1)), i € dom E1 ^ E (i) <: Ta(i) 
Vi € dom Tı N (dom S2 U IntFuns(p2)), i E dom E2 A E2 (i) <: Tı (i) 

(f) Vi € dom I, N dom T2, Tı (i) = T2(åi) 

(g) T = T; \ (dom S2 U IntFuns(p2)) U T2 \ (dom Sı U IntFuns(p1)) 


(h) Vardefs(pı) N Vardefs(p2) = Ø Vardefs(pı) U Vardefs(p2) = Vardefs(p) 
KSAS? [Z] p [E1 M E] 


Fig. 1. VSU’s rule of component composition, VSULINK. 


detailed in side conditions (b). Condition (c) requires functions classified identi- 
cally by pı and pə to have identical definitions, and requires differently classified 
functions to have identical type signatures and be in the import set of the com- 
pilation unit not providing the implementation. Condition (d) formalizes that 
system functions are not locally defined in either unit. Condition (e) expresses 
that a function imported by one module and programmatically provided by the 
other module must be exported by the provider; this condition ensures that the 
export contract cannot be bypassed. Condition (f) expresses that functions im- 
ported by both units must be imported identically - if necessary, this can be 
achieved using the consequence rule. Condition (g) calculates the remaining im- 
port specifications by combining the constituent imports, removing entries for 
the resolved functions, and ensuring the absence of duplicates. The final con- 
dition, (h), mandates that global variables from pı and pọ be distinct (hence 
initialization predicates have disjoint footprints) and propagated to p. 

The most interesting aspect of the rule is the duplicate use of the intersection 
operator, C1 M Ch, for constructing the concluding specifications of exported 
functions and system functions. The general definition of this operator is 


Cı (i) TN C(t) if i € domC, N dom Co 
C1 A Co := Ai. 4 Cy (i) if i € dom C4 \ dom Co 
C(t) if i € dom Ca \ dom C1 
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where A^ denotes the specification intersection operator mentioned in Section 2. 
Thus, exporting €; /\ Ez effectively exports both €, and E2, and similarly for 
Sı A Sp. Indeed, the individual export specifications can be reestablished using 
the consequence rule, as the properties of intersection specifications mentioned 
in Section 2 lift to (export specification) contexts: we have C1 A C2 E C; for 
, XCC XCC 
i € {1,2} and XEC AC, 

By permitting functions f that are internal to both pı and p2, VSU supports 
diamond-shaped composition patterns in which a sub-component, e.g. a library, 
is imported multiple times. Conditions (b) and (c) ensure that all copies of a 
repeatedly imported function f have the same body (i.e. CompCert AST), and 
that this body is retained in p. However, the library’s export specification may 
have been imported differently by the different units, hence G and G2 may well 
associate different (and formally incompatible) specifications with f. As G4 and 
Gp» are existentially hidden, we cannot inspect these specifications: adding a side 
condition to the rule that mentions the specifications Gi(f) and Go(f) would 
violate the abstraction principle. Nevertheless, the proof of the composition rule 
still requires us to attach some specification to the shared function, when con- 
structing the witnessing context of the concluding judgment, G. Our solution is 
to use intersection M, i.e. to instantiate the witness G with G1 M G2 in the Coq 
proof of VSULINK. By terminating the Coq proof script with Qed rather than 
Defined, this instantiation is opaque to clients: applications of VSULINK during 
program verification merely see that some G exists. 

Most side conditions of the rule are computational; in our applications of the 
rule in Sections 4.5 and 5, Coq’s tactical engine solves the majority of them. 


for any X. 


4 APDs and specification interfaces 


We now turn to the organization of predicates and function specifications. Our 
organization reflects typical realizations of abstraction principles in C, where 
heap data structures are introduced using forward declarations and referred to 
via pointers in header files, while the selection of a concrete representation (per- 
haps using private static variables) is private to an implementation. We illus- 


#include ” Connection.h” 

typedef struct pool «Pool; 

Pool consPool (Database d); 
Connection getConn (Pool p); 

void freeConn (Pool p, Connection c); 


typedef struct database «Database; 
typedef struct connection «Connection; 
Connection consConn (Database d); 
Database newDB (int DBidentifier); 


Fig. 2. Connection pools in C: Connection.h (left) and Connectionpool.h (right). 


trate our approach using Parkinson and Bierman’s connection pool example [56], 
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ported to C as an implementation of the APIs in Figure 2. Using forward declara- 
tions, the header files reveal only minimal information about the implementation. 
Connection.h allows clients to create a database entity (the parameter denotes 
a unique identifier; Parkinson and Bierman omit this constructor and do not 
model the type database explicitly) and to create connections to a database us- 
ing the constructor consConn. Connectionpool.h models a collection of (dormant) 
connections associated with a database; clients construct a pool using consPool, 
request connections using getConn, and return them using freeConn. 


4.1 Abstract predicate declarations (APDs) 


Figure 3 introduces abstract predicate declarations for the three data structures. 
Each APD declares zero or more spatial predicates, i.e. mpreds relating a Comp- 
Cert (pointer) value to suitable semantic information. Semantic information for 
the database is a DBindex (effectively a mathematical integer); connection and 
pool structures maintain pointers to the database; connections have additional 
internal state represented by the (abstract) type ConnTP. 


Record DatabaseAPD := { Record ConnectionAPD := { 
DB: DBIndex —> val — mpred; ConnTP:Type; 
DB_ptrnull: V db s, DB dbs + Conn: (val x ConnTP) — val —> mpred; 
!!(is_pointer_or_null s) }. NextConn: ConnTP — globals + mpred; 
Record PoolAPD := { Conn-isptr: V C c, Conn C c F!!(isptr c); 
CPool: val + val + mpred; Conn-_validptr: Y C c, Conn C c F 
CPool_ptrnull: Y d p, CPool d p F valid_pointer c }. 


!!(is_pointer_or_null p) }. 


Fig. 3. APDs for the connection pool example. val is CompCert’s type of values. 


Specifically, DatabaseAPD corresponds to the Database type declaration in 
Connection.h and asserts existence of a predicate DB, together with an axiom 
that enables clients to store a reference to a database in their own data structure. 
Operator !! injects a Coq proposition into VST’s assertion language. 

In similar style, ConnectionAPD and PoolAPD declare predicates Conn and 
CPool for the struct declarations connection and pool. In contrast to Parkinson 
and Bierman, we model that the connection module maintains state using the 
predicate NextConn. There is no need to reveal the concrete static variable 
used by our implementation though: globals denotes the collection of all such 
variables in VST. We assert that the head values of Conn and CPool are provably 
nonnull pointers and that a Conn’s head pointer is furthermore valid. 

All APDs are introduced as (dependent) Record types in Coq. We will con- 
struct values of these types in Section 4.3, i.e. implementation-dependent con- 
crete predicate definitions and lemmas validating the axioms. But first, we use 
the APD types abstractly to introduce specifications for the two modules. 
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4.2 Abstract specification interfaces (ASIs) 


Abstract specification interfaces (ASIs) consist of VST specifications for the API- 
exposed functions, parametric in all relevant APDs. In addition to the APDs 
introduced above, our example uses a third APD, denoted M, that declares an 
abstract predicate Memy gv and represents the malloc/free library. 

Figure 4 shows the ASI of Connection.h. We use subscripts to refer to the 
APD parameters: for example, DBp i p is the mpred obtained by applying the 
DB component of a database APD D to index i and pointer value p. 


Function Spec 
newDB(i;gv) {Memm gv} ~œ {p. DBp i p * Memm gv} 
consConn(d; gv) {DBp i d x NextConnc c gv x Memu gv} ~ 
if p = null then NextConnc c gv * DBp 7 d * Memm gv 
{ “else Ac’. NextConne c’ gv *» Conne (d, c) p * DBp i d * Memy m 


Fig. 4. ASI for Connection.h, parametric in databases (D), connections (C), and mem- 
ory systems (M). Memm gV represents M’s abstract predicate for a memory manager 
that is accessed by malloc and free. 


A specification F(z; gv) : {Pre} ~> {v. Post} is to be understood in safety- 
guaranteeing partial-correctness style, where ¥ denotes a list of actual arguments 
(of type val), gv refers to (if present) the global environment, v (again of type val) 
represents the return value (if present), and other items are implicitly universally 
quantified. Callers of such a function select instantiations for the universally 
quantified entities (“witnesses”) and must then establish Pre. 

Thus, the specification of newDB asserts that a new database entity satisfying 
DBp i p is allocated at the return value p, for the database with index i (an input 
argument). The allocation draws upon the abstract predicate Memy gv which 
is “located” at some global variable that is private to the malloc/free library. 

The specification of constructor consConn refers to Memy 9V in similar fash- 
ion and advances the module’s connection counter from c to some c’ upon success; 
in contrast to Parkinson and Bierman, we also support unsuccessful requests. 

The ASI for Connectionpool.h in Figure 5 is additionally parametric in an 
PoolAPD, P. Our specifications are again slightly more precise than the ones 
given by Parkinson and Bierman. As a consequence, the precondition of a se- 
quence such as p := consPool(s);c := getConn(s);freeConn(p,c) is DBp i d * 
Memm gv « NextConnc s gv rather then emp, hence exposing the reliance on 
the memory manager etc..Prefixing the instruction d := newDB(2) establishes 
DBp i d; we will explain how the latter two conjuncts are provided in Section 4.5. 


4.3 Verification of ASI-specified compilation units 


Substantiating the ASI of a header file, means to give — for a concrete implemen- 
tation — concrete definitions for the predicates in the newly introduced APDs, 
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Function Spec 


consPool(d;gv) {Memm gv} ~ {p. CPoolp d p x Memm gv} 


getConn(p;gv) {CPoolp d p» DBp i d * NextConng c gv * Memm gv} 
CPoolp d p « DBp 7 d* Memy gvx 
~ <n. ifm = null then NextConne c gv 
else de’ c”. NextConne c gv x Conne (d, c”) n 


freeConn(p, i; gv) {CPoolp d p x Connec (d, c) i x Memm gv} ~ {CPoolp d p x Memm gv} 


Fig. 5. The ASI for Connectionpool.h is parametric in a database APD (D), a con- 
nection APD (C), a connection pool APD (P), and a memory manager APD (M). As 
consPool takes a formal parameter d, the reader may have expected the specification 
{DBp i dx Memm gv} ~ {p. CPoolp d p» DBp i d* Memm gv} which is indeed derivable 
from the one given using VST’s frame rule. 


show that these definitions validate the associated axioms, and finally construct 
a VSU that has the ASI’s specifications as the export interface €. All these 
constructions are parametric in the APDs provided by other modules. 

We refer the reader to our source code [14] for the C implementation, the (con- 
crete) predicate definitions, and the proofs of the APD-supporting axioms. In 
case of Connection.c, these proofs reveal the instantiation of the APD’s ConnTP 
to Coq’s type of integers, Z, corresponding to the existence of a global integer 
variable in the C code that maintains a connection counter; the corresponding 
. =, . predicate then furnishes the abstract predicate NextConn. 

The substantiations of a unit’s APDs are subsequently used to instantiate 
its ASI and the specifications of its imported function, yielding (together with 
specifications of private functions) a proof context G that the unit’s local function 
bodies are then verified against. APDs provided by other compilation units are 
left abstract, so expose only their axioms. Specifically, the substantiation for 
Connection.c yields values c and d of types ConnectionAPD and DatabaseAPD, 
respectively, the predicate N = NextConn c 0, and a VSU 


VSUconn = Ha [Zcon] Connection.prog [Econn| 


where Econ is the partial specialization of the specifications in Figure 4 to C 
= c and D = d, Connection.prog is CompCert’s AST for Connection.c, and 
Toonn contains a specification for surelymalloc. For ConnectionPool.c, we similarly 
obtain a value p of type ConnectionpoolAPD and a VSU 


VSUpo01 = ae [Zpoo1] Connectionpool.prog [Epo], 
where Tpoo1 is comprised of the (abstract) specification of consConn and specifi- 
cations for free and surelymalloc, and Epgo1 is the partial specialization of Figure 5 
to P = p. Both VSUs are parametric in M, but VSUp,o1’s additional parame- 
ters D and C are instantiated when VSUconm and VSUpoo1 are combined using 
rule VSULINK. The result, VSU¢p, is still parametric in M but has resolved the 
imports of consConn, leaving only imports for free and surelymalloc. 
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4.4 A VSU for a malloc-free library 


A recent application of VST is Appel and Naumann’s verification of a malloc/free 
library [5]. Internally maintaining a fixed number of freelists — for entities of 
different size — this library exposes four functions in its API: malloc, free, pre_fill, 
try_pre_fill. When porting this development to the VSU framework, these give 
rise to two ASIs. The first one contains specifications for all four functions and 
is suitable for resource-aware clients. It employs the APD MallocFree-R-APD: 


Record MallocTokenAPD := { 
malloc_token’: share + Z — val — mpred; 
malloc_token’_valid_pointer: V sh sz p, malloc_token’ sh sz p F valid_pointer p; 
malloc_token’_facts: V sh sz p, malloc_token’ sh sz p H !! malloc_compatible sz p }. 
Record MallocFree_R_.APD := 
{ MF_Tok_R :> MallocTokenAPD; mem_mgr-_R: resvec — globals + mpred }. 


mem_mgr_R models the freelists as a resource vector that indicates the length 
of each freelist. The predicate malloc token’ refers to the piece of memory that 
is typically located at a small negative offset of a malloc’ed entity and holds 
administrative information of the library, but conceptually, it also constitutes 
a token that enables clients to share malloc’ed entities among different threads 
without loosing the ability to safely free entities. The second ASI only exposes 
malloc and free, and employs the more abstract APD 


Record MallocFreeAPD := 
{ MF_Tok :> MallocTokenAPD; mem_mgr: globals + mpred }. 


MF_Tok still presents a malloc token but mem_mgr now hides the existence of 
freelists - indeed, constructing a MallocFreeAPD from a MallocFree-R-APD simply 
quantifies existentially over a resource vector. Our proofs first refactor the prior 
verification as a VSU that exports a resource-aware ASI and then use VSUCON- 
SEQ (and export restriction E from Section 3.2) to weaken the resulting VSU 
to a VSU that only exports a resource-ignorant ASI. We denote the latter as 
VSUyr; the predicate Memm gv is now revealed to be a shorthand for mem_mgr 
gv, parametric in a MallocFreeAPD M, and we use Memyg gv below to refer to 
its instantiation for VSUyr. 


4.5 Putting it all together 


Using VSULINK again, we link VSUcp with a library VSU (reducing surelymalloc 
to malloc and the system function exit) and then with VSUy, obtaining 


VSUappiiv = Ai aN [] coreprog [Ecore]. 


Here, coreprog contains all code (application plus library) with the exception of 
main. Note that VSUappriv’s set of imports is empty; Score contains axiomatic 
specifications of OS functions such as exit and mmap. 

Independent from the construction of VSU appi» we verify main, i.e. an exem- 
plary client or unit test, as a semax_body statement w.r.t. a not yet instantiated 
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copy of Ecore. The specification that main is verified against a <: specialization 
of VST’s general main_spec but is still abstract in the APDs of the application’s 
code modules — see [14] for details. 

Finally, we connect VSU appLib with the verification of main to obtain a proof 
of VST’s semax_prog statement. It is in this last proof that the satisfaction of 
the abstract initialization predicates for the global variables, Memyz and N, is 
established from VST’s internal initialization predicates. 


5 Modular verification of the Subject /Observer pattern 


Programs in imperative or object-oriented languages often contain callbacks: 
chains of function calls A.m() > B.n() > A.L() between modules A and B in 
which m’s invocation of n (and hence the return of control to A in the call to 
1) happens when A’s state is invalid, i.e. does not satisfy A’s invariant. Clearly, 
mandating satisfaction of the invariant in l’s precondition — a typical requirement 
of API-level specifications — then prevents the verification of n. 

A typical example is the chain update — notify — get in the subject-observer 
pattern, a widely used design pattern [23] that has served as a litmus test for 
modular specification of callback-rich programming in the literature. Figures 6 
and 7 contain excerpts of a transcription of Parkinson’s [55] code into! C. Each 
Subject maintains a list of subscribers — a list of observers that will be notified 
whenever the Subject’s state is updated and then synchronize their internal state 
accordingly using get. The intended invariants express that each Subject’s ob- 
servers are in sync — a property that is violated during update’s traversal of its 
observer list, when not-yet-notified observers are out of sync but (precisely in 
order to get back in sync) nevertheless invoke get. 

The dominant technique for dealing with such situations in SMT-based tools 
employs ghost fields that track validity and unfolding of invariants and are sup- 
ported by further (ghost) infrastructure that controls ownership (see e.g. [47,11]). 
However, this does not necessarily achieve comprehensive representation hiding: 
for example, the permission to violate Subject’s invariant in get’s precondition 
propagates to the precondition of notify, allowing the latter function to access 
the field? Subject.value. Furthermore, the invariant-regulating techniques typi- 
cally require that SMT solving be carried out on a whole-program basis. 

The flexibility of APDs to introduce multiple predicates enables an alter- 
native in which callbacks are specified using special-purpose predicates that — 
similar to typestates [62] — emphasize protocol-style behavior, do not reveal the 


1 Our implementation [14] contains two further callbacks, newObs — registr — notify 
and registr — notify — get; the former one commences in the constructor, before 
any invariant has been established. 

? For example, one may insert abstraction-violating get/putfield instructions in the 
subject-observer code at http://comcom.csail.mit.edu/e4pubs/{#}observer. This 
tool implements an advanced variant of invariance regulation using ghost instruc- 
tions, semantic collaboration [59], for Eiffel. Fields are not private, and the method- 
ology does not prevent representation exposure between such closely coupled classes. 


/* SubjectObserver.h */ 
typedef struct subject «Subject; 
typedef struct observer «Observer; 


/* Subject.h*/ 

#include ” SubjectObserver.h” 
Subject newSubject (void); 

void registr (Subject s, Observer o); 
void update (Subject s, int n); 

int get (Subject s); 

int freeSubject(Subject s); 

Observer detachfirst(Subject s); 


/* Observer. h */ 

#include ” SubjectObserver.h” 
Observer newObs (Subject s); 
void notify (Observer o); 

int val (Observer o); 

void freeObserver (Observer o); 
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/* Subject-rep.h */ 
#include ” SubjectObserver.h” 
typedef struct node «Node; 
struct node { 

Observer obs; 

struct node * next; 
J; 
struct subject { 

Node obs; 

unsigned value; 


}; 


/* Observer_rep. hx / 
#include ”SubjectObserver.h” 
struct observer { 

Subject sub; 

int cache; 


}; 


Fig. 6. Subject /Observer: header files. The left column shows the public APIs; Sub- 
ject_rep.h and Observer_rep.h are private to their respective module implementations. 


/* Subject c */ 

#include ”surelyMalloc.h” 
#finclude ” Observer.h” 
#include ”Subject.h” 
#include ” Subject_rep.h” 


int get (Subject s) { return (s— value); }; 


void update (Subject s, int v) { 
s— value = v; Node n = s > obs; 


/* Observer.c */ 
#include ”surelyMalloc.h” 
#finclude ” Observer.h” 
#tinclude ”Subject.h” 
#finclude ” Observer_rep.h” 


void notify (Observer o) { 
o — cache = get(o > sub); 
return; } 


while (n) { notify(n— obs); n = n > next; } } 


Fig. 7. Excerpts from Subject.c and Observer.c for the callback update — notify — get. 


validity of module invariants, and maintain representational hiding by being just 
as abstract as a module’s main predicate. 

Concretely, our approach employs semantic subjects that are comprised of a 
list of observer references and a (current) value, while observers are represented 
as a subject pointer and the cache: 


Definition SubjRep:= (list val) x Z. Definition ObsRep := val x Z. 


Next, our APDs complement the predicates relevant for API calls by external 
clients, Srep and Orep, by (residual) predicates for calling the Subject functions 
registr, update, and get, and the Observer functions notify and val; we also intro- 
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duce a predicate for the postcondition of get, GetPost: 

Record SubjectAPD := { 

Srep, RegPre, UpdPre, GetPre, GetPost: SubjRep — val — mpred; 
SubjRegister: V S s, Srep S s F RegPre S s; 

SubjUpdate: V S s, Srep S s H UpdPre S s; 

SubjGetPrePost: V S s, Srep S s | GetPre S s x (GetPost S s -» Srep S s); 
GetPre_ptrnull: V S s, GetPre S s F !!(is-pointer-or-null s) } 

Record ObserverAPD := { Orep, NtfPre, ValPre: ObsRep — val — mpred; 
ObsNtfy: V O o, Orep O o F NtfPre O 0; ObsVal: V O o, Orep O o F ValPre O o; 
NtfPre_isptr: V O o, NtfyPre O o F!!(isptr o) } 

Entailment axioms such as SubjUpdate permit external clients to invoke callback 

functions directly but may be omitted for functions that should only be invoked 

via callbacks. The residual predicates sanction indirect invocations via callbacks 
without revealing the satisfaction status of module-internal invariants. 

Axiom SubjGetPrePost splits Srep into a token that can (only) be used to 
invoke get, plus a token for reestablishing Srep from GetPost. The latter is a sep- 
arating implication — rather than an entailment: it represents the requirement 
that an observer yields back control to its subject after completing a callback to 
get — the subject had retained part of its state prior to invoking notify. 

To enforce these behaviors, we employ the specifications in Figures 8 and 9; 
again, the ASIs are parametric in all APDs mentioned, notwithstanding the 
mutual dependence of the modules. Using axiom SubjGetPrePost, one may show 


Function Spec 
update(s,v) {UpdPregp (l, z) s x Observers NtfPreop s vals L} 
~> {Srepsp (L, v) s * Observers Orepop s v!!! I} 
get(s) {GetPresp S s} ~ {p. !!(p = snd(S)) && GetPostsp S s} 


Fig. 8. ASI of Subject (excerpt), parametric in a Subject APD (SP), an ObserverAPD 
(OP), and a MemoryAPD (M). 


that the specifications for get and notify are in subsumption relationship with 
large-footprint counterparts that permit invocations by external clients: 


{Srepsp S s} ~œ {p. !!(p = snd S) && Srepsp S s} 
{NtfPreop (s, c) o * Srepgp S s} ~ {Orepop (s, snd S) o * Srepsp S s} 


The specification of update makes reference to an auxiliary Coq function that 
represents the “big” separating conjunction *(v,o)€combine(vals,1) P (s,v) 0, 


Observers (P:ObsRep — val + mpred) (s:val) (vals: list Z) (1: list val): mpred. 


The substantiation of these interfaces relative to our C implementations de- 
fines the main predicates as 


Definition Srep (l, v) s := do. listrep l o * s es (o,v x Mtok(Ews, STP, s). 
Definition Orep O o := o ES ore O x Mtok(Ews, OTP, o). 


Verified Software Units 133 


Function Spec 
newObs(s; gv) {RegPregp S s x Memm gv} ~ 
{p. Orepop (s, snd S) p * Srepgp(p :: fst S, snd S) s * Memm gv} 


notify (o) {NtfPreop (s, c) o x GetPresp S s} ~ 
{OrePop(s, snd S) o * GetPostsp S s} 
val(o) {ValPreop (s,c) o} ~ {c. Orepop (s, c) o} 


freeObserver(o; gv) {Orepop O o x Memm gv} ~ {Memm gv} 


Fig.9. ASI of Observer, parametric in APDs SP, OP, and M. 


Here, listrep is a typical list representation predicate over Node items, modeling 
the observers associated with a Subject. STP and OTP are shorthands for Clight’s 
representation of the struct definitions Subject and Observer, Ews represents 
an exclusive writable share in VST, and Mtok(.,.,.) is a variant of predicate 
malloc_token’ from Section 4.4. 

Some residual predicates are minor variants of Srep and Orep. For example, 


Definition NtfyPre O o := Mtok(Ews, OTP, o) * 3w. o ieee (fst O,v). 


existentially abstracts over snd O but is otherwise identical to Orep. This makes 
validating axiom ObsNtfy trivial. As NtfyPre does not depend on a subject’s value, 
no modification of the latter can affect the former’s. Other residual predicates — 
like RegPre — are even definitionally equal to the main predicates, but the APD 
mechanism ensures that this fact is not exposed to clients. 

Our C implementation permits GetPre and GetPost to actually be defined 
identically (indeed, getters typically don’t alter data structures. . . ): 


Definition GetPrePost (l, v) s := s.value ES, Tp V * Mtok(Ews, STP, s). 


Here, the p.m eh, v is a variant of p aay v that specifies the content at p.m, 
where path 7 is a list of field names and array subscripts. Thus, GetPrePost only 
specifies the content of s.value; the remaining portion of s is exactly what is 
retained when SubjGetPrePost splits off GetPre from a Subject. The motivation 
for this handling is that the invariant of the loop in update (which contains the 
callback to get via notify) only traverses the node list. Specifically, an invari- 
ant involving the full Srep would not ensure that the spine of the list remains 
unchanged, as the definition of Srep quantifies existentially over the node list. 
This aspect illustrates the danger of predicates that are too abstract to be useful. 

Constructing VSUs for Subject and Observer proceeds straight-forwardly; 
we exercise VSU’s support for shared libraries by first combining surelyMalloc 
with each of these VSUs separately, before linking the resulting VSUs with each 
other, with VSUye, and with a main client as described in Section 4.5. 


5.1 Specification and proof reuse 


To evaluate specification modularity and proof reuse, we verified several varia- 
tions of our implementation. First, to evaluate robustness under representational 
change, we have Subject internally maintain a freelist of Observer nodes: 
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struct subject { Node fl; Node obs; unsigned value; }; 


The freelist is drawn upon in registr (we only invoke surely-malloc if fl is null) 
and replenished in detachfirst. Constructor newSubject creates an empty freelist, 
and freeSubject frees the entire list. 

The code modification triggers new Clight ASTs, but the majority of Coq 
files can then simply be reprocessed: the model-level definitions, APDs, and 
ASIs of Subject and Observer remain unchanged, and so do the files associated 
with verifying Observer, linking, and main. The only modifications are in the 
implementation-dependent validation of Subject, namely in the definitions of the 
representation predicates and in the VST proofs of the individual functions. 

Second, we verified a variant in which notify’s invocation of get is replaced 
by a function pointer. The key code modifications are 

/* Addition in SubjectObserver.hxe/ —_/x Modification in Observer.hx/ 

typedef int («callback)(Subject s); void notify (Observer o, callback f); 

/* Modification in Observer. cx / 

void notify (Observer o, callback f) { o cache = f(o — sub); return; }; 

The calls to notify in update and registr obtain the additional argument &get, 
and the specification of get can be removed from the imports of the Observer 
VSU. The small specification of notify becomes 


: _ {NtfPre (s, c) o x GetPresp S s * funcptr’ get g} 
nanag) ~ {Orepop(s, snd S) o x GetPostsp S s} 


where funcptr’ ¢ g expresses that value g is a pointer to some function satisfying 
specification ġ, and dget is the entry for get from Fig. 8. notify’s large specification 
is adapted similarly. Repairing the proofs incurs changes in < 10 lines of Coq. 

A third modification exploits VST’s support for impredicative quantification 
to abstract over GetPresp and GetPresp in the definition of ¢, such that notify’s 
specification is effectively parametric in suitable GetPre/GetPost pairs. Adapt- 
ing the verification involves step-indexed aspects of VST and hence requires a 
little more work; details are included in the Coq development [14]. 

Finally, we verified a variation in which observers register with two subjects, 
as an example of a more complex interaction pattern. As this affects model-level 
functionality, modifications are not confined to module-internal predicate defi- 
nitions but affect APDs declarations and ASI definitions. However, neither the 
encapsulation of representation nor the modularity of verification were compro- 
mised; supporting more than two subjects per observer would likely be similar. 


5.2 Pattern-level specification 


An alternative specification of subject-observer was proposed by Parkinson [55], 
who sidesteps the conflict between callbacks, modularity, and abstraction. Giv- 
ing up on specifying the two classes independently, this approach defines a single 
abstract predicate, SubObs, that ties a subject to all its observers and yields 
aggregate-level function specifications. We can recover such an aggregate inter- 
face by proving that the specifications involving SubObs are abstractions (in the 
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sense of . <: .) of the exports of the SubjectObserver VSU, generically in APDs 
SP and OP. Indeed, Parkinson’s formulation amounts to a two-predicate APD: 


Record AggAPD := 
{ Sub: val > list val + Z — mpred; Obs: val > val > Z — mpred }. 


with specifications shown in Figure 10, using the derived notions 


Definition SubObs s O v := Sub s O v * *,coObS o s v. 
Definition Obs- o s := dv. Obs o s v. (x Obs- is related to Obs as > _ is to >. x) 
Definition SubObs- s O := Jv. SubObs s O v. 


Function Spec 
newSubject(gv) {Memm gv} ~ {s. SubObs-a s nil x Memm gv} 
registr(s, 0; gv) {Suba s O v x Obs_, o s x Memm gv} 

~ {Suba s (o :: O) v x Obsa o s v x Memm gv} 
update(s, v) {SubObs_, s O} ~ {SubObs, s O v} 


get(s) {Suba s O v} ~ {v. Suba s O v} 

newObs(s;gv) {SubObs, s O v x Memm gv} ~œ {p. SubObsa s (p :: O) v * Memm gv} 
notify (o) {Suba s O v x Obs-a o s} ~ {Suba s O v x Obsa o s v} 

val(o) {SubObs, s O v} ~ {v. SubObs, s O v} 


Fig. 10. Selected aggregate specifications, parametric in an AggAPD A. Except for the 
occurrence of Memm 9V, the specifications coincide with Parkinson [55]’s specifications. 


Constructing an AggAPD A from a SP/OP pair is trivial: take Sub to be Srepgp 
and Obs to be Orepop; proving the . <: . lemmas is then straight-forward. 

SubObs constitutes a pattern invariant, or the pattern’s primary predicate, 
with residuals Sub and Obs. From the aggregate’s point of view, update — notify 
— get is not a callback but an internal nesting of invocations, so the small- 
footprint specifications typically don’t pose a problem for existing methodolo- 
gies; client-visible specifications with large footprints can be derived using the 
frame rule. In this sense, the pattern reestablishes “sequential atomicity” of op- 
erations. Exploring whether other design patterns can be similarly derived from 
the ASIs of their constituent classes is a topic for future research: are typical de- 
sign patterns the abstraction units at which sequential atomicity is reestablished, 
callbacks at most occur in valid states, and residual predicates are avoided? 

An aggregate specification for the function pointer implementation from Sec- 
tion 5.1 can be obtained using a modified AggAPD, with residual predicates 
GetPre etc.. But a better option is to remove the pattern-internal functions 
notify, registr, and perhaps even get from the aggregate ASI. In fact, notify’s 
new signature reveals the use of function pointers, hence even an aggregate-level 
specification would have to include funcptr’ ¢ g terms. Thus, we instead employ 
the notion E from Section 3.2 to lift the VSU for SubjectObserver with function 
pointers from Section 5.1 to a VSU for the aggregate but narrowed ASI and then 
reverify main w.r.t the latter. 
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6 Verification of object principles 


This section considers features that — together with state encapsulation and 
modularity — are cornerstones of object orientation: the ability for (instances 
of) multiple implementations of an interface to dynamically coexist and inter- 
act, dynamic dispatch, subtyping, self, and inheritance. To maintain the dataless 
discipline, we employ a uniform but simple object encoding that is typical for 
industrial and open-source C developments: dynamic dispatch is implemented 
using function pointers that are bundled into separate structs (method tables) 
that are accessible as the first element of the object representations. Subtyping 
— providing additional methods — and representation inheritance are modeled by 
extending these structs, respectively, but are orthogonal to each other, and only 
the former one is exposed in APIs. In the second half of this section, we hide the 
dynamic dispatch mechanism behind a wrapper interface. We specify objects by 
reference to a semantic (Coq-level) object model, thus comprehensively separat- 
ing object reasoning from C-level reasoning: constructors establish, and methods 
maintain, abstract object predicates that clients need not (and cannot) unfold. 

We again proceed in stages, using the widely used running example of points 
located on a one-dimensional axis (see e.g. [18]). Figure 11 shows a preliminary 
API for basic, bumpable, and colored points, organized in a simple subtyping 
relationship. We provide multiple implementations for each interface (using dif- 


typedef struct point «Point; typedef struct bmethods « BMethods; 
struct methods { 
int (*get) (Point); typedef struct cpoint «CPoint; 
void (set) (Point, int); }; struct cmethods { 
typedef struct methods x Methods; int (*get) (Point); 
void (xset) (Point, int); 
typedef struct bpoint «BPoint; void (*xbump) (BPoint); 
struct bmethods { int (xgetC) (CPoint); }; 
int («get) (Point); typedef struct cmethods x CMethods; 
void (xset) (Point, int); struct point { Methods mtable; }; 
void (*bump) (BPoint); }; struct bpoint { BMethods mtable; }; 


typedef struct bmethods x BMethods; struct cpoint { CMethods mtable; }; 


Fig. 11. Pointlnterface.h, containing three interfaces for one-dimensional points 


ferent data representations), each exposing its set of constructors in a separate 
header file - Figure 12 shows implementation I1. Clients select an implemen- 
tation during object creation but cannot otherwise distinguish between them: 
method dispatch selects the appropriate function from the method table, as in 


BPoint bp = makeBPoint_I1(4); int i = ((BMethods) (bp—mtable))—get((Point)bp)). 
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struct point_I1 { struct bpoint_I1 { struct cpoint_I1 { 
Methods mtable; BMethods mtable; CMethods mtable; 
int value; }; int value; }; int value; int color; }; 


int get_I1 (Point p) { return (((struct point_I1 «)p)— value); } 
void set_I1 (Point p, int i) { ((struct point_I1 *)p)— value = i; return; } 
void bump-_I1 (BPoint p) { ((struct bpoint_I1 *)p)— value++; return; } 
int getC_I1 (CPoint p) { return (((struct cpoint_I1 «)p)— color); } 
BPoint makeBPoint_I1 (int i) { 

struct bpoint_I1 *p = (struct bpoint_I1 «)surely_-malloc(sizeof xp); 

BMethods m = (BMethods)surely-malloc(sizeof xm); 

m > get = &get_I1; m > set = &set_I1; m > bump = &bump_Il; 

p > value = i; p + mtable = m; return ((BPoint)p); } 


Fig. 12. Implementation I1. Constructors makePoint_I1 and makeCPoint_I1 omitted. 
A second implementation I2 employs representations point_I2 etc. and exposes con- 
structors makeBPoint-_I2 etc. 


The basis of object specifications is a general method table predicate: 


MTable(T, k, names, m, specs, Z) = Mtok(Ews, k, m) « 


q r.!(readable(7r)) && *(14.4)enamesx specs 4 U-funcptr(¢Z, v) * m.u HT v. 


It asserts that the struct m (of shape k) contains at field names names pointers 
to functions satisfying specs, where T is of Coq-type Pred(T) = (Txval) — mpred 
and specs has type list (Pred(T) — funspec). A generic object layout predicate 


N T tbl (o k : type) names specs (x: T x val): mpred = 
ÔT. T x x» Mtok(Ews, ô, snd x) « 
dm. (snd x).tbl >E"S m * MTable(T, k, names, m, specs, T) 


WwW 


then combines a specified method table (located at field tbl) with the requirement 
that the (memory identified by the) object pointer satisfy Z. C types ø and 6 
represent the object’s static and dynamic types. The joint use of Z in MTable 
and N ensures that an object’s methods agree with its data component on what 
representation predicate should be maintained. The existential abstraction over 
T ensures representation hiding: external clients merely see a invariant of (Coq) 
type T. Thus, different C implementations of an object interface may employ 
different representations but still satisfy the same external specification. 

Specifically, we introduce Coq-level object interface types in the style of Hof- 
mann and Pierce’s object model [30]: 


Record PointM (X:Type):Type := { get : X > Z; set : X >Z >X; } 
Record BPointM (X:Type):Type := 
{ PointM_of.BPointM :> PointM X; bump : X — X; bumpable : X — Prop; }. 
Inductive Color:Type := blue | red | green. 
Record CPointM (X:Type):Type := 
{ BPointM_of_CPointM :> BPointM X; getC: X — Color; color_code: Color > Z; }. 
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The parameters X represent semantic object representations. On the one hand, 
we may instantiate these and define Coq-level behaviors, like m1, bm1, cm1: 


Record PointRep := { value : Z }. 
Record CPointRep := { pointRep :> PointRep; color : Color }. 
Definition m1: PointM PointRep := 
{| get := fun s > value s; set := fun si > {| value := i |} |}. 
Definition bm1: BPointM PointRep := 
{| PointM_of.BPointM := m1; bump := fun s > {| value := value s + 1 |}; 
bumpable := fun s > min-_signed < value s < max_signed |}. 
Definition cm1: CPointM CPointRep := {| ... («details omitted) |} 


But the interface types also enable specifications for get(p) and set(p, j): 
get_spec T (P : PointM T) = AZ : PredT. {Z(t,p)} ~ {get T P t. T(t, p)} 
set_spec T (P : PointM T) = AZ : Pred T. 

{min-signed < j < max-signed & T(t, p)}~ {Z(set T P t j,p)} 

Thus, each method has a Coq-level counterpart that is parametric in (semantic) 

representations and behaviors. To specify the constructors, we first define spe- 

cializations of M for the three interfaces by instantiating with the appropriate 
method specifications and syntactic elements: 


P T (P : PointM T) : Pred T = 
N T mtable point methods [get; set] [get-spec T P;set-spec T P] 
BT (B : BPointM T): PredT = 
N T mtable bpoint bmethods |get; set; bump] 
[get-spec T B; set-spec T B; bump-spec T B] 
C T (C : CPointM T) : Pred T = 
N T mtable cpoint cmethods |get; set; bump; getC] 
[get-spec T C; set-spec T C; bump-spec T C;getC_spec T C] 


Here, point, bpoint, cpoint and methods, bmethods, cmethods are the structs 
defined in the header file (Figure 11) and mtable, get,...,getC are the field 
names in these structs. The exemplary spec for base point constructors is then 
makePoint(i; gv) : {min-signed < i < max-signed & Memm gv} 
~ {p. Memy gv * P T P (Init_Point(z), p)}. 


Verifying I1 and I2 then yields VSUs whose export interfaces tie makePoint-I1 
makePoint-I2 to the specialization of this constructor to P := m1, and similarly 
for the other constructors. The resulting objects behave indistinguishably; the 
existential quantification over Z in the definition of M carries over to P, B, and 
C, ensuring that the representational differences between I1 and I2 are hidden 
from clients: when verifying a method call, clients unroll P etc., but each time 
receive a “fresh” symbolic representation predicate Z. 


Wrapper-based verification The unrolling of object predicates corresponds to the 
exposure of the method table in our API. Programmatically, better encapsulation 
is provided by wrappers that hide the function pointer mechanism, like 


int GET (Point p) { Methods m = p— mtable; return (m> get(p)); } 
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The header file for these wrappers resembles the API of an ADT, but merely 
disguises object-orientation: we still support multiple implementations (using 
the same constructors as above), and operations are still invoked using dynamic 
dispatch. On the specification side, wrappers can be modeled as an APD 


Record WrapperAPD := { Wr_Pt: V T, PointM T —> Pred T; 
Wr_BPt: VY T, BPointM T —> Pred T; Wr_CPt: Y T, CPointM T — Pred T }. 


with one constructor per interface, in resemblance to the use of class names to 
index predicate families [56]. The VSU for the wrapper then encapsulates the 
object predicates P etc., exporting an ASI with specifications such as 


{Wr PEW T P (t,p)}~ {get T Pt. Wr PEW T P (t,p)} A 
GET (p) : {Wr_BPt W T P (t, p); ~ {get T P t. Wr_BPt W T P (t,p)}A 
{Wr_CPt W T P (t,p)} ~ {get T Pt.WrCPt W T P (t, p)} 


We can further improve client-side usability by replacing these intersection spec- 
ifications by a deep embedding of the three interface alternatives; this eliminates 
a corresponding case distinction in client-side proofs, when symbolic execution 
reaches the invocation of a wrapper function. As an example, we verified a linked 
list module that permits insertion of basic, bumpable, or colored points and pro- 
vides map operations that apply SET, BUMP, ...to all elements. Each element 
may internally employ I1 or I2. Of course, the precondition of mapping BUMP 
requires all elements to be of dynamic type (at least) BPoint and have a bumpable 
coordinate; however, this condition emerges as a constraint on semantic objects 
and can be discharged without unfolding object representation predicates. 


Self and late binding Verification using the above constructions fails for methods 
whose body contains virtual calls on self: the definition of N effectively separates 
the object’s data region from the method table upon method entry, making only 
the former accessible inside the body. To overcome this limitation, we define a 
variant of M using the higher-order recursive functor 


F (T X :PredT) : PredT = 
A(x : T x val). 3m. T xx» Mtok(Ews, ô, snd x) * (snd x).tbl HES m 
x > MTable(T, k, names, m, specs, X) 


in which Z is now a parameter (we eschew the parameters T,..., specs for read- 
ability) and X plays the role of M. Recursion via X is protected by VST’s [4] 
modality >; indeed, any access to a method table inside a method happens at 
least one step later than the method’s own invocation. Contractiveness of F 
(proven in VST) ensures the existence of a fixed point F(Z) := HORec(F(Z)). 
Recovering the quantification over Z, we then replace M with N* := JT. F (Z). 
With this modification in place, one may verify virtual calls on self, like a variant 
of I1 that implements bump using get and set (still w.r.t. m1, bm1, and cm1). 
An important application of self is (observably behavior-altering) method 
overriding. At the semantic level, Hofmann and Pierce explicate how positive 
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subtyping supports both early and late binding variants of overriding; these dif- 
fer in whether the observable behavior of bump (when implemented in terms 
of get and set) is affected when a subclass subsequently overrides set to, say, 
reset the coordinate to 0. Furthermore, method overriding may affect how func- 
tions defined in a superclass act on subclass-introduced state components. For 
example, one may impose that updating the coordinate turns a point’s color 
blue. Semantically, all these variations yield novel behaviors m2, bm2, and cm2, 
etc. that can be compared to the earlier behaviors using Hofmann and Pierce’s 
theory. As a consequence of our two-level reasoning, and the choice to param- 
eterise constructor/method specifications by behaviors, we can leverage their 
techniques: implementations I3, I4... that realize the overriding variants can be 
verified as further VSUs for our earlier export interface, by (now) specializing the 
constructor specifications to m2, etc.. Afterwards, the modified behaviors prop- 
agate through dynamic dispatch and wrappers as expected, permitting clients 
of e.g. the list module to map bump over elements with different behavior. Side 
conditions during symbolic method calls refer exclusively to semantic objects 
and behaviors, do not necessitate the unrolling of representation predicates, and 
can often just be discharged using simplification. 


7 Discussion 


Related and future work Certified Abstraction Layers (CAL, [24,26]) are used 
in the CertiKOS project [25] to verify feature-rich operating system kernels and 
hypervisors in Coq. CAL permits horizontal and vertical composition of compo- 
nents, and establishes full abstraction between the imports and exports. CAL’s 
methodology was recently rephrased as a synthesis from a systems-oriented DSL, 
DeepSEA, to C, with a CompCert backend [64]. However, “(T)here is no use of C 
pointers and no built-in support of dynamic memory allocation (every DeepSEA 
object is realized as a set of static variables), so programs that need dynamic 
allocation will have to implement it themselves” ([64], page 10). While this frag- 
ment remarkably suffices for the intended application area, it is unlikely to satisfy 
general-purpose programmers or compiler writers for other systems languages. 
Ironclad Apps and Ironfleet [29,28] are systems based on Dafny and TLA+ 
for verifying safety and liveness of distributed systems, and app security. By 
connecting model-level, concurrency-aware reasoning, state-machine refinement, 
and Floyd-Hoare verification, their approach provides abstraction-bridging func- 
tionality similar to that of proof-assistant-based reasoning, trading off TCB size 
and foundational integration in an logical framework against automation and 
developer productivity. Ironclad Apps compile to verified assembly; Ironfleet 
employs a formally unverified route via Dafny and the .NET compiler for C#. 
Uberspark [67] is a system based on Frama-C and SMT for compositionally 
verifying commodity system software written in C and assembly. Uberspark’s pri- 
mary applications are hypervisor components and OS kernels, but it currently 
addresses only safety and security properties (memory separation, control-flow 
integrity, information flow) rather than functional correctness. The same limita- 
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tion applies to proof-carrying code systems [49,3,6,13,27], at (virtual) machine 
or assembly level. Several PCC systems proposed hierarchies of formalisms that 
connect operational semantics, a general-purpose program logic, and tactical 
checkers or algorithmic inference systems for higher-level type systems, abstract 
interpretation, or program analyses [16,2,17,1]. VST’s tactical automation is op- 
timized for symbolic execution and functional correctness, but the underlying 
proof rules could equally well be used to prove soundness of static analyses or 
code synthesizers; we expect our structuring principles for separate compilation 
will be just as useful in these scenarios as they are for functional correctness. 


McKinna and Burstall [45] pioneered the use of existential abstraction to for- 
mally tie programs to their specifications and proofs in a modern proof assistant. 
VSU realizes aspects of their vision of deliverables for a mainstream language but 
is at this point not endowed with similarly rigorous categorical underpinnings. 


Representation hiding in separation logic can also be obtained using hypo- 
thetical frame rules [54,10], but no such rule is provided by VST at present. 
Pragmatically, the two approaches appear complementary: modules that expose 
interesting state (e.g. a list ADT, the point objects,...) favor existential abstrac- 
tion/APDs, as clients can access associated reasoning principles on demand, at 
specific program points. In contrast, modules like the resource-unaware memory 
manager might benefit from hypothetical framing: the predicate Memy QV car- 
ries no client-relevant information but still needs to be carried around in many 
function specifications in our treatment. 


VST’s specification subsumption resembles behavioral subtyping [44,42], a 
notion commonly used in verification tools for Java-like languages for relating 
specifications across a class hierarchy. Exploring the relationship between our use 
of positive subtyping, other notions of subtyping and inheritance, and Liskov’s 
Substitution Principle [43] constitutes future work. 


By supporting field update, Hofmann and Pierce’s theory addresses short- 
comings of purely functional object models, but its support for object aggregates 
or complex ownership structures appears limited and not much studied. A two- 
level encoding could likely also be developed for concurrency-inspired object 
models [33,32,31], perhaps by adapting the theory of interaction trees [68,39]. 
However, VST’s partial-correctness interpretation of triples limits the end-to-end 
usefulness of coinductive reasoning. A recent proposal for integrating statically 
typed Smalltalk-inspired objects into a functional calculus is Wyvern [52]. 


In the context of SMT-based verification tools, Parkinson and Bierman [57] 
highlight examples that go beyond behavioral subtyping, and Summers et al. [66] 
identify a catalog of advanced uses of class invariants. We intend to apply VST 
to the former soon; a better understanding of the latter could perhaps commence 
by recasting Drossopoulou et al.’s general framework for object invariants [22] 
in separation logic. However, some aspects of class/object invariants may not 
immediately transfer from Java-like to Smalltalk-style object models. 


In Java, an object’s representation remains constant over its lifetime. By 
separately quantifying over Z, our pre- and postconditions may support dynamic 
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representation change a la Fickle [21] (with suitable updates to the method 
table), as long as both representations fit into an object’s top-level struct. 

Krishnaswami et al. [41] verify subject-observer and other patterns (itera- 
tors, flyweight, factory) by equipping a functional language, Idealized ML, with 
effectful specifications based on higher-order separation logic. Their verification 
was partially formalized in a predicative Hoare Type Theory/Ynot and employs 
abstract module definitions that combine code and specification. Their use of 
separating implication can likely be transferred to our setting, but their im- 
plementation does not separate the functionality of subjects and observers to 
same extent and thus does not raise the same specification challenges. Consider- 
ate reasoning [65], object propositions [51], and multi-object languages such as 
Rumer [9] are alternatives in the design space spanned by invariant techniques, 
aliasing/separation and ownership; all validate variants of Composite pattern. 

Extrapolating from our exploration of the Composite pattern, it appears 
feasible to generate VST specifications, loop invariants, and APD declarations 
from Verifast [34]; synthesizing full proofs will be more challenging. 

Object encodings in the Linux kernel, GTK/GObject, or the SQLite database 
engine deviate from the Smalltalk tradition and expose APIs that are not fully 
dataless. We suspect these systems also differ from standard language-level ob- 
ject disciplines in their need for deeply layered ownership control or model-level 
object aggregates. Like Schreiner’s encoding [63], these systems thus provide 
interesting opportunities for future case studies. 


Conclusion The ability of type theory to capture modularity and abstraction is 
well-established. But while, e.g. Mitchell and Plotkin’s insight has been highly 
influential in the world of functional programming, it has not yet made its way 
into verification tools for mainstream languages. Taking inspiration from their 
work, we introduced Verified Software Units as a general component calculus 
for VST, and developed an infrastructure for separating the declarations of ab- 
stract predicates from concrete predicate definitions. We showed that residual 
predicates support callbacks which violate operation atomicity, as is the case 
in the subject-observer pattern. Finally, we introduced a two-level approach to 
specifying object principles, yielding a simple logic for Smalltalk-style objects in 
C. Together, these innovations substantially advance VST’s capability to verify 
modular C developments that employ diverse programming styles. 
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Abstract. While recent progress in quantum hardware open the door 
for significant speedup in certain key areas, quantum algorithms are still 
hard to implement right, and the validation of such quantum programs 
is a challenge. In this paper we propose QBRICKS, a formal verification 
environment for circuit-building quantum programs, featuring both para- 
metric specifications and a high degree of proof automation. We propose 
a logical framework based on first-order logic, and develop the main tool 
we rely upon for achieving the automation of proofs of quantum specifi- 
cation: PPS, a parametric extension of the recently developed path sum 
semantics. To back-up our claims, we implement and verify parametric 
versions of several famous and non-trivial quantum algorithms, including 
the quantum parts of Shor’s integer factoring, quantum phase estimation 
(QPE) and Grover’s search. 


Keywords: deductive verification, quantum programming, quantum circuits 


1 Introduction 


1.1 Quantum computing. Quantum programming is seen as a potential 
revolution for many computing applications: cryptography [61], deep learning [7], 
optimization [23]22], solving linear systems [83], etc. In all of these domains, 
current quantum algorithms beat the best known classical algorithms by either 
quadratic or even exponential factors. In parallel to the rise of quantum algo- 
rithms, the design of quantum hardware has moved from lab-benches [I4] to 
programmable, 50-qubits machines designed by industrial actors [4388] reaching 
the point where quantum computers beat classical computers for specific tasks 
[4]. This has stirred a shift from a theoretical standpoint on quantum algorithms 
to a more programming-oriented view with Instructions 

the question of their concrete coding and im- ae 
plementation [66]65]55]. 

In this context, an important problem is i 
the adequacy between the mathematical de- Feedback 
scription of an algorithm and its concrete im- 
plementation as a program. 
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Fig. 1: The hybrid model 
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1.2 The hybrid model. The vast majority of quantum algorithms are des- 
cribed within the quantum co-processor model [42], i.e. a hybrid model where a 
classical computer controls a quantum co-processor holding a quantum memory 
(cf. Figure[Ih. The co-processor is able to apply a fixed set of elementary opera- 
tions (buffered as quantum circuits) to update and query (measure) the quantum 
memory. Importantly, while measurement allows one to retrieve classical (proba- 
bilistic) information from the quantum memory, it also modifies it (destructive 
effect). The quantum memory state is represented by a linear combination of 
possible concrete values, generalizing the classical notion of probabilities to the 
complex case, and the core of a quantum algorithm consists in successfully set- 
ting the memory in a specific quantum state. 

Major quantum programming languages such as Quipper [80], Liquil) [67], 
Q# [64], ProjectQ [63], Silq [8], and the rich ecosystem of existing quantum 
programming frameworks follow this hybrid model. Such circuit-building 
quantum languages are the current consensus for high-level executable quantum 
programming languages. 


1.3 The problem with quantum algorithms. Starting from an initial 
state, a quantum algorithm typically describes a series of high-level operations 
which, once composed, realize the desired state. Each high-level operation may 
itself be described in a similar way, until one reaches elementary operations 
(quantum gates). Describing an algorithm therefore requires both to list these 
elementary operations, or quantum circuit, and to specify the circuit’s behavior. 

A major issue is then to verify that the quantum circuit generated by the 
code written as an implementation of a given algorithm is indeed a run of this 
algorithm, and that the circuit has indeed the specified characteristics of shape 
(for instance: a polynomial size). 

While testing and debugging are the common verification practice in clas- 
sical programming, they become extremely complicated in the quantum case: 
debugging and assertion checking are problematic due to the destructive aspect 
of quantum measurement, the probabilistic nature of quantum algorithms seri- 
ously impedes system-level quantum testing, and classical emulation of quantum 
algorithms is (strongly believed to be) intractable. On the other hand, nothing 
prevents a priori the formal verification of quantum programs. 


1.4 Goal and challenges. Our goal is to provide an automated formal ve- 
rification framework for circuit-building quantum programs. Such a framework 
should satisfy the following principles: (1) Parametricity: it should allow para- 
metric (i.e. scale-invariant) specifications and proofs, so as to enable the generic 
specification and verification of parametrized implementations. This is crucial as 
quantum algorithms always describe parametrized families of circuits; (2) Proof 
automation: it should, as far as possible, provide automatic proof means in order 
to ease adoption. 

Prior works on quantum formal verification do not fully reach these goals 
together, as they are either not parametric, or not automated. Model-checking 
methods are fully automatic but not parametric — moreover they are 
highly scale-sensitive. Recently, Amy [12] developed a powerful framework for 
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reasoning over quantum circuits, the path-sums symbolic representation. Thanks 
to their good compositional properties, reasoning with path-sums is well auto- 
mated and can scale up to large problem instances (up to 100 qubits). Yet, the 
method is not parametric and only addresses fixed-size circuits. On the other 
side of the spectrum, several approaches deal with parametricity but sacrifice 
automation as they generate proof obligations in higher-order logic, supported 
with proof assistants such as Coq or Isabelle/HOL. One can cite the approach of 
Boender et al. [10], Qwire [5358], SQIR [85/34] or QHL [68/7146]69145|). Com- 
bined with the use of the standard matrix semantics for quantum circuits — 
that we show in Section [8]cumbersome for automation — only very few realistic 
quantum programs have been verified in a parametric way [45135184]. 


1.5 Proposal. We propose QBRICKS, an automated formal verification frame- 
work for circuit-building quantum programs, featuring parametric specification 
together with a high degree of proof automation. 

We bring two key innovations along the road: (Key 1) we propose the new 
parametrized path-sums (PPS) symbolic representation of families of quantum 
circuits, extending path-sums |1} to the parametric case while keeping good com- 
positional properties. PPS prove extremely useful both as a specification mecha- 
nism and as an automation mechanism; (Key 2) we carefully tune together our 
programming language (QBRICKS-DSL) and specification logic (QBRICKS-SPEC) 
so that the corresponding verification problem remains automatable in practice 
— first-order proof obligations — while the framework is still expressive enough 
to write, specify and verify realistic quantum programs (Shor order finding — 


Shor-OF [61], QPE |41Ji6], Grover [BI]). 


1.6 Contributions. We bring the following contributions. 


— A flexible symbolic representation for reasoning about quantum states, buil- 
ding upon the recent path-sum symbolic representation [IJ2]. Our repre- 
sentation, called parametrized path-sums (PPS), retains the compositional 
and closure properties of regular path-sums while allowing genericity and 
parametricity of both specifications and proofs. Especially, first-order logic 
together with PPS provide a unified and powerful way to reason about many 
essential quantum concepts (Section 5.2) and fit well with the standard way 
of describing quantum algorithms. We are the first to highlight this connec- 
tion and make PPS a “first-class” concept, where prior works are limited to 
standard path sums, or rely on the standard matrix semantics; 

— A programming and verification framework, that is: on one hand, a core 
domain-specific language (QBRICKS-DSL, Section for describing fami- 
lies of quantum circuits, with enough expressive power to describe parame- 
tric circuits from non-trivial quantum algorithms; and on the other hand, a 
first-order logical (domain-specific) specification language (QBRICKS-SPEC, 
Section [5p, tightly integrated with PPS and QBRICKS-DSL to specify prop- 
erties of parametrized programs representing families of quantum circuits. 
The careful interplay between these two components yields first-order proof 
obligations, and thus is a key aspect of proof automation; 
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— A dedicated proof engine: we introduce the Hybrid Quantum Hoare Logic 
(HQHL) deduction system for deductive verification over circuit-building 
quantum programs. It is tightly coupled with PPS and produces proof obli- 
gations in the QBRICKS-SPEC logic (Section [6}; 

— This framework is embedded in the Why3 deductive verification tool [9/24] 
as a DSL (Section|7), and provides proof automation mechanisms dedicated 
to the quantum case. This material is grounded in standard mathematics 
theories (linear algebra, arithmetic, complex numbers, binary operations, 
etc.) with 450+ definitions and 1,000+ lemmas. All lemmas have been proved 
in Why8, and the whole framework is publicly available; 

— We present in Section [8] the first ever verified parametric implementation of 
the quantum part of Shor’s factoring algorithm (Order Finding, includ- 
ing the polynomial complexity of the circuits produced by our implemen- 
tation and probability requirements), as well as verified parametric imple- 
mentations of other major quantum algorithms: Quantum Phase Estima- 
tion (QPE) [41[16]*} Grover’s (search) algorithm [31] and Quantum Fourier 
Transform (QFT) . Our method achieves a high level of proof automation 
(96% on Shor-OF), significantly reducing proof effort (factor 13.6x vs. QHL 
on Grover, factors 7.7x and 6.4x vs. SQIR on resp. QPE and Grover). 


Additional technical material can be found in the online extended versione [I3]. 
Implementation and benchmarks are available online[54]. 


1.7 Discussion. The scope of this paper is limited to proving properties of 
circuit-building quantum programs. We do not claim to support right now the 
interactions between classical data and quantum data (referred to as “classical 
control” in the literature), nor the probabilistic side-effect resulting from the 
measurement. Still, we are already able to target realistic implementations of 
famous quantum algorithms, and thanks to equational theories for complex and 
real number we can automatically reason on the probabilistic outcome of a mea- 
surement. Also, we do not claim any novelty in the proofs for Shor-OF, QPE or 
Grover by themselves, but rather the first highly-automated parametric correct- 
ness proofs of the circuits produced by programs implementing them, and the 
first parametric correctness proofs of an implementation of Shor-OF. 


2 Background: Quantum Algorithms and Programs 


While in classical computing, the state of a bit is either 0 or 1, in quantum 
computing [50] the state of a quantum bit (or qubit) is described by amplitudes 
over the two elementary values 0 and 1 (denoted in the Dirac notation with 
|0) and |1)), i.e. linear combinations ag|0) + a |1) where ag and a, are any 
complex values satisfying |ao|? + |ay|? = 1. In a sense, amplitudes are general- 
ization of probabilities. More generally, the state of a qubit register of n qubits 


3 QPE is a major quantum building block, at the heart of, e.g., HHL logarithmic 
linear system solving algorithm or quantum simulation [28]. 
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(“qubit-vector”) is any superposition of the 2” elementary bit-vectors (“basis el- 
ement”, where a bit-vector k € {0..2" — 1} is denoted |k),), that is any |u), = 
ele ay, |k)» such that D |a,,|? = 1. For example, in the case of two qubits, 
the basis is |00), |01), |10) and |11) (also abbreviated |0)2, |1)2, |2)2 and |3)2). 
Such a (quantum state) vector |k), is called a ket of length n (and dimension 
2”). 

Technically speaking, we say that the quantum state of a register of n qubits 
is represented by a normalized vector in a Hilbert space of finite dimension 
2” (a.k.a. finite-dimensional Hilbert space), whose basis is generated by the 
Kronecker product (a.k.a. tensor product, denoted ©) over the elementary bit- 
vectors. For instance, for n = 2: |0) & |0), |0) @|1), |1) & |0) and |1) @ |1) act as 
definitions for |00}, |01), |10) and |11). 


2.1 Quantum data manipulation. The core of a quantum algorithm con- 
sists in manipulating a qubit register through two main classes of operations. 
(1) Quantum gate. Local operation on a fixed number of qubits, whose action 
consists in the application of a unitary map to the corresponding quantum state 
vector i.e. a linear and bijective operation preserving norm and orthogonality. 
The fact that unitary maps are bijective ensures that every unitary gate ad- 
mits an inverse. Unitary maps over n qubits are usually represented as 2” x 2” 
matrices. (2) Measurement. The retrieval of classical information out of the 
quantum memory. This operation is probabilistic and modifies the state of a 
quantum register: measuring the n-qubit system a ax|k)» returns the bit- 
vector k of length n with probability |a;,,|?. Quantum gates might be applied in 
sequence or in parallel: sequence application corresponds to map composition (or, 
equivalently, matrix multiplication), while parallel application corresponds to the 
Kronecker product, or tensor product, of the original maps — or, equivalently, 
the Kronecker product of their matrix representations 4] 


2.2 Quantum circuits. In a 
way similar to classical Boolean 
functions, the application of quan- : 
tum gates can be written in a dia- io) H 
grammatic notation: quantum cir- 
cuits. Qubits are represented with 
horizontal wires and gates with Wi ` ye u” ue 
boxes. Circuits are built composi- i 
tionally, from a given set of atomic 
gates and by a small set of circuit 


jo) HH] 
E 


10) H 
: invert(QFT (n)) 


[=] 
E 5 


Fig. 2: The circuit for QPE 


* Given two matrices A (with r rows and c columns) and B, their Kronecker product 
auiB ea acB 
is the matrix A & B= : ta, : . This operation is central in quantum 
arı B Pep are B 
information representation. It enjoys a number of useful algebraic properties such as 
associativity, bilinearity or the equality (A8 B): (C 8 D) = (A-C) 8 (B - D), where - 
denotes matrix multiplication. 
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combinators, including: parallel and sequential compositions, circuit inverting, 
controlling, iteration, ancilla creation, etc. As an example of a quantum circuit, 
we show in Figure [2] the bird’s-eye view of the circuit for QPE, the (quantum) 
phase estimation algorithm, a standard primitive in many quantum algorithms. 
QPE is parametrized by n (a number of wires) and U (a unitary oracle) and is 
built as follows. First, a register of n qubits is initialized in state |0), while an- 
other one is initialized in state |v),. Then comes the circuit itself: a structured 
sequence of quantum gates, using the unary Hadamard gate H, the circuits 
U” (realizing U to the power 2°) and the reversed Quantum Fourier Transform 
inverse(QFT (n)) . Sub-circuits U?” and inverse(QFT (n)) are both defined 
in a similar way. 

Here, one should simply note two things: (1) the circuit is made of parallel 
compositions of Hadamard gates and of sequential compositions of controlled 
U?” (the controlled operation is depicted with vertical lines and symbol e); (2) 
the circuit is parametrized by n and by U. This is very common: in general, 
a quantum algorithm constructs a circuit whose size and shape depend on the 
parameters of the problem. It describes a family of quantum circuits. 


2.3 Reasoning on circuits and the matrix semantics. Quantum circuits 
essentially describe unitary operators acting on Hilbert spaces. In finite di- 
mension, unitary matrices faithfully represent unitary operators: it has been the 
original mathematical formalism for circuits — coined here as the matrix seman- 
tics. If this representation is well-adapted for representing simple high-level cir- 
cuit combinators such as the action of control or inversion, it is not well-suited 
for specifying the behavior of many complex circuits coming from the litera- 
ture. Because of this cumbersomeness, textbook descriptions of circuits make 
use of an informal representation: operators are described by their action on a 
basis vector (see, for example the description of Shor-OF in p. 232]). This 
is however understood as a shortcut notation for matrices which remains the 
main medium for reasoning on circuits. Formal approaches to quantum compu- 
tation witness this prevalence of matrices as circuit 


representation. 


2.4 Path-sum representation. Path sums [I[2] are a recent symbolic repre- 
sentation. Its strength is to formalize the notation used in quantum algorithm 
literature (eg, [50]). A unitary operator U is written as U : |x) +» PS(x) where 
x is a bit vector and PS() is defined with the syntax of Fig. |3| In the Figure, 
addition and multiplication over real are denoted rescpectively with + and ., 
and xj is the it? projection of bit vector x. The term n is an integer index, 
characterizing the range of the path-sum. Then each term k € [0,2”[ in the 
path-sum is defined through: 


1. the phase polynomial P(x) — a real value building complex scalar e777" P«(); 


2. the basis-ket function p(x), defining the ket-vector |¢,(a)) this scalar value 
applies to. 


This representation is closed under functional composition and Kronecker prod- 
uct. For instance, if U is defined as in Fig. [3]and if V sends y to PS’(y) defined 
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2”—1 
1 


PS() = en 3 Prt Pe (2) 


P(z) == TE | P(e): P(e) | P(e) + P@) 


|o(x)) == [bu (£)) 8...8 [bin (a)) 
bye) == xy) | mb (x) | b(a) A byrj (2) | by (£) XOR byrj(x) | tt | ££ 


Fig. 3: Syntax for regular path-sums [2JI] 


n 


as shy D? g e2THPLU ói (y)), then U@ V sends |z) @ |y) to 


gntn! 4 


1 -7-i( Pj sgn (x gañ 
g" y e2 T U(P5 an (2) +P io ©) |b, /9n(x)) Q lorgan (y)) (1) 
j=0 


which is in the form shown in Figure B] The compositionality of this semantics is 
used by Amy [2] to prove the equivalence of large circuit instances. Nonetheless, 
its main limitation stands in the fact that path-sums only address fixed-size 
circuits. Albeit a compositional tool, useful to automate proofs, it cannot be 
used for proving properties of parametrized circuit-building quantum programs. 


This paper proposes an extension of path-sum semantics to address the para- 
metric verification of quantum programs. 


3 Introducing PPS 


In this section, we introduce the main logical apparatus of our framework: pa- 
rametrized path-sums. We first present a motivating example and then discuss 
the construction. 


3.1 Motivating example. Let us 
consider the n-indexed family of circuits 
consisting of n Hadamard gates, in se- < 

h +4 Fi 4 3 n gates 
Sn , ps à a T aS a a ae Precondition: n > 0 is even. 
two adamar gates can easily es own Cn sends |z) to |z} 
equivalent to the identity operation. In Cn consists of n gates. 
other word, when fed with |0}, if n is even 
the circuit outputs |0}. Albeit small, this Fig. 4: Motivating Example 
circuit family together with its simple spe- 
cification exemplifies the typical framework we aim at in the context of certifi- 
cation of quantum programs. 


A circuit Cn defined as 


Post-conditions: { 


— The description of the circuit family is parametrized by a classical parameter 
(here, the non-negative integer n); 
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— The pre-condition imposes both constraints (here, the evenness of n) and 
soundness conditions (here, the non-negativeness of n) on the parameters; 

— The post-condition can both refer to the semantics of the circuit result and 
to its form and shape (here, its size). 


The circuit family presented in Figure [4] will be used in the rest of the paper as 
a running, toy example for QBRICKS. In particular, we show in Example [1] how 
to code it in our framework and how to express the specification in Example a 
Its parametrized implementation in QBRICKS is three lines of code long and its 
specifications takes six lines. It is proved by recurrence over the parameter n, 
the induction step requiring two calls for lemmas (depending on the evenness of 
parameter n). 


3.2 Parametrizing path-sums. In order to formalize the semantics of the 
example of Fig. |4| we aim at generalizing path-sums. 


Illustration. For a fixed n, the circuit Cn implements either the identity 
0 r 

(when n is even), in which case the path-sum is PSyq(x) = F D eiT Olg) 

or the Hadamard gate (when n is odd), in which case the path sum is PSy (x) = 


1 S CT . . . . . 
a vo. erin ke |k) A candidate parametric path-sum for the family of circuits 


{Cn}n from Figure [4] could then be written in factorized form as 


gn%2_ 4 


1 ._ (n%2)-k-w 
PSn(2) = — z J erin |if even(n) then z else k). (2) 
v2 k=0 


Generalization. In general, parametrized Path Sums (PPS) are defined over 
a language of typed terms with possibly free (typed) variables. At the very 
minimum the language has to be equipped with Boolean values (to handle the 
values of the ket-vector) and integers (for instance to handle the range). 

Given such a language, a PPS is a path-sum where the range, the phase 
polynomial and the basis-ket can in general be explicit, open terms referring to 
external parameters. Formally, a pps is defined as a function inputting a set of 
parameters p and outputting: 


— a parametrized integer pps_width(h, p), featuring the number of qubits the 
target circuit is acting on — its width; 

— another parametrized integer pps_range(h, p), abbreviated as r(h, p). It in- 
dicates the range of the sum (defined as the set BV,(a,p) of bit vectors of 
length r(h,p)); 

— a basis ket function pps_ket(h, p), generalizing term ¢ from Table |3| For 
any pair (x,y), of a bit vector x of length pps_width(h, p) (standing for an 
input basis vector) and a bit vector of length r(h,p), it returns a bit vector 
of length pps_width(h, p) (standing for an output basis vector); 

— a parametrized angle function pps_angle(h, p)(x, y), generalizing the phase 
polynomial P from Table|3} For any pair (x,y) such as above, it returns a 
real value @. 
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Then, the behaviour of a parametrized quantum circuit C(p) is described as 
the i/o function inputting a basis ket |x(p)) of length the width of C(p) and 
outputting the parametrized sum term: 


pps_apply(h,p)(|x(p))) = 


— oa LS RE paketih pley) 
V2 yEBV (n.p) 


For sake of readability, we often ommit the explicit mention of the parameters. 
For instance, the PPS P induced by is parametrized by the integer n. It 
is such that for any int n, pps_width(P,n) = 1 and pps_range(P,n) = n%2. 
Furthermore for any bit vectors x,y of lengths 1 and n%2, pps_ket(P, n)(, y) 
is equal to x if n is even and to y otherwise, and pps_angle(P, n)(x, y) = n%2 - 
Tjo] * Yfo}: One then gets expression by applying pps_apply(P, p). 

Hence, the term language needed for describing PPS of otherwise sophisti- 
cated families of quantum circuits can afford to be minimal: first-order typed 
terms equipped with an equational theory are enough. We also find out that 
first-order, predicate logic is suitable for specifying the properties of quantum 
programs: there is no need for higher-order logic such as the ones of Coq or 
Isabelle/HOL. This is the key to automation. 


4 QBRICKS-DSL 


QBRICKS-DSL is the (domain-specific) language of our framework. It is designed 
as a first-order, functional language aimed at circuit description. Measurement 
is out of the scope of the language, and all QBRICKS-DSL expressions are ter- 
minating. We follow a very simple strategy for circuit building: we use a regular 
inductive datatype for circuits, where the data constructors are elementary gates, 
sequential and parallel composition, and ancilla creation. In particular, unlike 
Quipper [30] or Qwire [53], a quantum circuit is not a function acting on qubits: 
it is a simple, static object. Nonetheless, as illustrated by our experimentations 
(Section B}, this does impede neither expressiveness nor parametricity. 

Furthermore, even if the language does not feature measurement, it is no- 
netheless possible to reason on probabilistic outputs of circuits, if we were to 
measure the result of a circuit. Indeed, this can be expressed in a regular theory 
of real and complex numbers (See Section (6.5). 


4.1 Syntax of QBRICKS-DSL. QBRICKS-DSL is a small first-order func- 
tional, call-by-value language with a special datatype circ as the medium to 
build and manipulate circuits. The core of QBRICKS-DSL can be presented as 
the simply-typed calculus presented in Figure [6] The basic data constructors 
for circ are CNOT, SWAP, ID, the Hadamard superposition gate H, phase shift 
gate Ph(e) and the parametrized rotation R,(e). The constructors for high-level 
circuit operations are sequential composition SEQ, parallel composition PAR and 
ancilla creation/termination ANC (see Figure [5] for details). 
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H papaia ea ees as Map, Ms 
CNOT cS M ee ye a z f Ms 
Ph(n) —H S n - M > = NAL HMs 
R(n ES DANA 
(n) N |0) a [M7 
E ESEE ERE a aa aS a | MEY ONS ae ea ctl invert 
PAR(M, N) ANC(M) 
M N 
Er EET — H, Ha 
ee A mH; TM at r 
SEQ(M, N) I as ean 
M2 : MHM 
Ma M7 
Fig. 5: Circuit combinators 
Expression e n=a|c| f(er,...,en) | let (a1,...,@n) =eine’ | 
if e; then e2 else ez | iter f e1 e2 
Data Constructor c n= n | tt | ff | (e1,..., en) | CNOT | SWAP | ID | H| Ph(e) | Rz(e) | 
ANC(e) | SEQ(e1, e2) | PAR(e1, e2) 
Function f n= fal fe 
Declaration d n= let fa(v1,...,@n) =e 
Type A = bool | int | T | Ai X++- x An | circ. 
Value v =x |n |tt | ff | (v1,...,Un) | 


CNOT | SWAP | ID |H | Ph(n) | Rz(m) | ANC(v) | SEQ(v1, v2) | PAR(v1, v2) 
Context C|-] ::= [-] | f(v,- .. viz1, C[-], e€i+1;,.--;€n) | 

let (£1,..., £n) =C|-] ine | if C[—] then e2 else ez | 

iter f C|-] e| iter f v C[-]| (u1,...vi:-1, C[-], ei+1,---,€n) | 

cNoT | 1D | 8 | Pa(C[-]) | R.(C[-]) | ANC(CT-1) | 

szq(C[-],€) | SEQ(v,C[-]) | PAR(CI-], €) | PAR(v, C[-]) 


Fig. 6: Syntax for QBRICKS-DSL 


On top of circ, the type system of QBRICKS-DSL features the type of inte- 
gers int (with constructors n, one for each integer n), the type of Booleans bool 
(with constructors tt and ff) and the type of n-ary products (with constructor 
(€1,.--,€n)). This type system is not meant to be exhaustive and it can be ex- 
tended with usual constructs such as floats, lists and other user-defined inductive 
datatypes — its embedding into WhyML makes it easy to use such types. The 
term constructs are limited to function calls, let-style composition, test with 
if-then-else and simple iteration: iter f n a stands for f(f(--- f(a)---)), 
with n calls to f. We again stress that this could easily be extended — we just 
do not need it. 


The language is first-order: this is reflected by the types A of expressions. 
The type of a function is given by the types of its arguments and the type of its 
output. The type of a function with inputs of types A; and output of type B is 
written A; X- x A, > B. 


A function f is either a function fa defined with a declaration d or a constant 
function fe. The functions defined by declarations must not be mutually recur- 
sive: this small, restricted language only features iteration. Constant functions 
consist in integer operators (+, x, —, etc), Boolean operators (A, V, 7, >, etc), 
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Tt f:ALx::-x An B Thre: A; 
Tb fleien): B 


I,cx:Akau:A 
The: A; 
Tk (e1,...,@n) : Ar X +++ X An 


TFee.:AixX-::xX An T, xı : Á1,..., £n: ÁÅnF e2: B 
I F let (z1,..., £n) = 61 inez : B 


T Hei:bool Hez: A Tke3:A f:A>ATFe:int rFe:A 
I F if e1 then ez else e3 : A I Fiter f e e2: A 


Fig. 7: Typing rules for QBRICKS-DSL 


comparison operators (<, <, >, > ,=, Æ : int x int — bool) and high-level cir- 
cuit operators: ctl, invert : circ > circ for controlling and inverting circuits, 
and width,size : circ — int for counting the number of input and output 
wires, and the number of gates (not counting ID nor SWAP) in the circuit C. See 
Figure [5] for the intuitive definition of circuit combinators. 

The typing rules are the usual ones, summarized for convenience in Table 


4.2 Operational semantics. As any other regular functional programming 
language, QBRICKS-DSL is equipped with an operational semantics based on 
beta-reduction and substitution. We define a notion of value and applicative 
context as in Fig. [6] We then define a rewriting strategy as the relation defined 
with Cle] + C[e’] whenever e — e’ is one of the rule of Table|8| The table is split 
into the rules for the language constructs and the rules defining the behavior of 
the constant functions. We only give a subset of the latter rules. For instance, 
the arithmetic operations are defined in a canonical manner, and the Boolean 
and comparison operators are defined in a similar manner on values of type int 
and bool. The rules for the constant functions acting on circuits are also for the 
most part straightforward: the size of a sequence is the sum of the sizes of the 
compounds for instance. The rules which we do not provide are the ones for the 
control operation ctl: the intuition behind their definition can be found in [I3]. 
For the elementary gates, any definition can be used (see e.g. [50]), as long as it 
can be written with the chosen set of gates. One just has to adjust the lemmas 
referring to ctl in QBRICKS-SPEC. Similarly, the inverse of elementary gates are 
not given: we can choose the usual ones from the literature —and this definition 
is then parametrized by the choice of gates. 


4.3 Properties. The targeted low-level representation for an expression of 
type circ is a value made of the circuit data constructors presented in Ta- 
ble [6] a value v of type circ is made out of the grammar SEQ(v1, v2) | ANC(v) | 
PAR(v1, V2) | CNOT | SWAP | ID | H | Ph(m) | Rz(m). Since recursions are reduced to 
finite iterations, we can derive the following lemma through a simple inductive 
reasoning: 
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Language constructs 
Assuming that there is a declaration f(x1,..., £n) £e. 
f(v,- --, Un) > efx = U1,..., En = Vn] 
let (£1,..., Zn} = (v1,...,Un) in e > eļ|zı := v1, ..., En := Un] 
if tt then eı else e2 > e1 
if ff then e else e2 > e2 
whenn <0: iter fna-a 
when n >0: iter f na —> f(iter f n-1 a) 
Constant functions (subset of the ane) 
n+m>n+m width(CNOT) —> 
n-m>n-m width(SWAP) > 
nx*xm—>n*m width(g) > (g other gate) 
size(ID) > 0 width(SEQ(v1, v2)) > w 
size(SWAP) — 0 width(PAR(v1, v2)) —> width(v1) + width(v2) 
size(g) +1 (g other gate) width(ANC(v)) > width(v) — 1 
size(SEQ(v1, v2)) > size(v1) + size(v2) invert(SEQ(v1, v2)) > SEQ(invert(v2), invert(v1)) 
size(PAR(v1, v2)) > size(v1) + size(v2) invert(PAR(v1, v2)) > PAR(invert(vi), invert(v2)) 
size(ANC(v)) > size(v) invert (ANC(v)) > ANC(invert(v)) 
Table 8: Operational semantics for QBRICKS-DSL 
Lemma 1 (Safety properties and normalization). Provided that e : A 


is a closed expression, and provided that all the functions in e recursively admit 
(external) definitions, then either e is a value or it reduces. IfI + e: A and 
e> e', then TF e : A. Finally, the reduction strategy (—) is normalizing: there 
does not exist an infinite reduction sequence e1 > e2 >... 


Example 1. The example of Section [3.1] can be written in QBRICKS-DSL as 


let aux(x) = SEQ(z, H) 
let main(n) = iter aux n ID 


The function aux inputs a circuit and appends a Hadamard gate at the end. The 
function main then inputs an integer parameter n and iterates the function aux 
to obtain n Hadamard in sequence. In particular, one can show that for instance 


main 4 —>* SEQ(SEQ(SEQ(SEQ(ID, H), H), H), H), 
that is, a sequence of 4 Hadamard gates. 


4.4 Universality and usability of the chosen circuit constructs. A uni- 
versal (resp. pseudo-universal) set of elementary gates is such that they can be 
composed thanks to sequence or parallelism so as to perform (resp. approach ar- 
bitrarily close) any unitary matrix. In QBRICKS-DSL, we chose the small pseudo- 
universal elementary set {CNOT, SWAP, ID,H}UU,,cy{Ph(m),Rz(n), }. Other gates 
can then be defined as macros on top of them. If one aims at using QBRICKS 
inside a verification compilation tool-chain, these macros can for instance be the 
gates of the targeted architecture. 
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4.5 Validity of circuits. A circuit is represented as a rigid rectangular shape 
with a fixed number of input and output wires. In particular, there is a notion 
of validity: a circ object only makes sense provided two constraints: 


— in SEQ(C1, C2), the two circuits Cı and C2 should have the same width. For 
instance, SEQ(CNOT,H) is not valid. This is a simple syntactic constraint; 

— in ANC(C), the circuit C should have n +1 wires. Moreover, if given as input 
a vector where the last qubit is in state |0), its output should also leave this 
qubit in state |0). This condition is, on the other hand, a semantic constraint. 


Note that even these syntactic constraints cannot be checked by a simple typing 
procedure, because of the higher-order reasoning involved here: the constraints 
must hold for any value of the parameters. All these constraints apply on pa- 
rametrized circuits. They translate into constraints for the parameters of their 
related PPS and are expressed in our domain-specific logical specification lan- 
guage, QBRICKS-SPEC. They are meant to be sent as proof obligations to a proof 
engine. 


Example 2. Note how the circuit generated by main in Example [I] is not neces- 
sarily a valid circuit (although in this case it is). This is one of the constraints 
that can be handled by QBRICKS-SPEC, as shown in Example [4] 


4.6 Denotational semantics. As all expressions in QBRICKS-DSL are ter- 
minating, one can use regular sets as denotational semantics for the language. 
In order to be able to handle the definitions coming up in Section |5| we in- 
clude in the denotation of each type an “error” element L We therefore define 
the denotation of basic types as the set of their values: [bool] = {tt, ff, L}, 
jint] = ZU {1} and [circ] = {v | Fv: circ} U{L}. Product types are 
defined as the set-product: |A; x --- x An] = (JAi] x- x JAnf) U {L} and 
|T] = {x, L}, the singleton set. Finally, functions are defined as set-functions 
from the input set to the output set. The denotation of the language constructs 
are the usual one in a semantics based on sets ; for the constant functions, the 
definitions are the canonical ones: arithmetic operations maps to arithmetic ope- 
rations for instance. In QBRICKS-DSL, everything is well-defined and L is only 
attainable from L. For instance, | +a = L. 

Note that in the denotational semantics one can build non-valid circuits. For 
instance, the circuit SEQ(CNOT,H) is a member of |circ]. This is to be expected 
as we have the following property: 


Lemma 2 (Soundness). Provided that + e : A, we have jel] € JA] \ {L} 
Moreover, provided that e > e' then we have je] = le’. 


It is however possible to formalize the notion of syntactically valid circuits 
as a subset of |circ]. 


Definition 1. We define the (syntactic) unary relation Vsyntax on [circ] as 
follows: Each one of the gates belongs to Vsyntax; if C1 and C2 belongs to Vsyntax 
then so does PAR(C}, C2); if moreover 2 < |width](C1) then ANC(C) belongs to 
Vsyntax and if |width](C)) = |width](C2) then SEQ(C1, C2) belongs to Vsyntax- 
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5 QBRICKS-SPEC 


The language QBRICKS-DSL is only aimed at manipulating circuits. The reaso- 
ning features of QBRICKS —and the PPS introduced in Section are defined 
in the logic and the specification tools offered within QBRICKS-SPEC. 


5.1 Syntax of QBRICKS-SPEC. We define QBRICKS-SPEC as a first-order, 
predicate logic with the following syntax. 


Formula ¢,p:=$VY|¢A¥|7¢| o> | 
R(é1,..-, ên) | ĉi = €2 
First-order expression ê ::= a | c(é1,...,é€n) | f(€1,---,€n) | fe(é1,---, én). 


The first-order expressions ê form a subset of QBRICKS-DSL: they are restricted 
to variables and (formal) function calls to other first-order expressions. Unlike 
regular, general expressions —meant to be computational vehicles— these first- 
order expressions only aim at being reasoned upon. The function names are then 
expanded with counterpart logical functions fe. Among these new functions, we 
introduce one function iters : int x A — A for each function f : A > A, 
standing for the equational counterpart of the iteration] The logic functions 
are defined equationally in the logic: see Section [6.4] for details. The relation R 
ranges over a list of constant relations over first-order expressions. In QBRICKS- 
SPEC, we identify relations and functions of return type bool. A special relation 
is the equality: we explicitly introduce it in the syntax to emphasize the fact 
that QBRICKS-SPEC is meant to deal with equational theories. 

The type system of QBRICKS-SPEC is extended with opaque types, equipped 
with constant functions and relations to reason upon them. They come with 
no computational content: the aim is purely to be able to express and prove 
specification properties of programs. This is why we do not incorporate them in 
QBRICKS-DSL’s type system. 

The opaque types we consider in QBRICKS-SPEC are complex, real, pps, 
ket and bitvector. The operators and relations for these new types are given 
in Table p] Note that in the rest of the paper we will omit the cast operations 
i_to_r and r_to_c. We will also use a declared exponentiation function [—]-] 
overloaded with types complex x int —> complex and real x int — real. For 
any integer n and boolean b, constructor bv_cst buildsthe bit vector of length n 
and constant value b. Other functions for types complex, real and bitvector 
are standard. Types pps and ket are novel and form the main reasoning vehicle 
in QBRICKS-SPEC. 


5.2 The types pps and ket. In short, the type pps encodes our parametrized 
path sum (PPS) representation for expressions of type circ in QBRICKS-DSL, 
while ket encodes the notion of ket-vector. As these types are pure reasoning 
apparatuses, we only need them in QBRICKS-SPEC and they are defined uniquely 
through an equational theory. 


5 This is required to stay within the grammar of terms of QBRICKS-SPEC. 
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complex and real pps 
i,m : complex pps_width : pps > int 
i_to_r : int — real pps_range : pps > int 
r_to_c: real — complex pps_angle : pps X bitvector x bitvector — real 
Re, Im, abs : complex — real pps_ket : pps x bitvector x bitvector — bitvector 
ell ; complex — complex pps_apply : pps x ket > ket 
—e, te, *ey fe : complex xX complex — complex pps_equiv : pps X pps —> bool 
=r; +r; *r, fr : real x real — real circ_to_pps : circ — pps 
v= : real > real ket 
bitvector ket_length : ket — int 
bv_length : bitvector — int ket_get : ket x bitvector — complex 
bv_cst : int x bool > bitvector bv_to_ket : bitvector — ket 
bv_get : bitvector x int — bool +k,—k, Qk : ket x ket —> ket 
bv_set : bitvector x int x bool — bitvector *, : complex x ket — ket 


Table 9: Primary operators for QBRICKS-SPEC 


The type pps is equipped with four opaque accessors: pps_width, pps_width, 
pps_width, pps_ket and pps_angle acting on pps from Section and with 
the function circ_apply. If path-sums compose nicely, a given linear map does 
not have a unique representative path-sum (partly due to the fact that phase 
polynomials are equal modulo 27). To capture this equivalence, we introduce the 
constant relation pps_equiv. In order to relate circuits and PPS, we introduce 
the constant function circ_to_pps: it returns one possible PPS that represents 
the input circuit. The chosen PPS is built in a constructive manner on the 
structure of the circuit. A useful relation is (—> —) relating a circuit and a PPS: 
it is defined as (c> h) = pps_equiv(circ_to_pps(c),h). Another useful macro 
is function circ_apply : circ x ket — ket, defined as 


circ_apply(C,k) = pps_apply(circ_to_pps(C), k) 


The type ket is equipped with standard operations for manipulating ket- 
vectors (Table [9). bv_to_ket turns a bit vector into a basis ket-vector ; 
ket_length returns the number of qubits in the ket ; ket_get returns the am- 
plitude of the corresponding basis ket-vector. The other operations are the usual 
operations on vectors: addition, subtraction, tensors, scalar multiplication. 


5.3 Denotational semantics of the new types. The denotational seman- 
tics of real and complex are respectively the sets RU{L} and CU{L}, and the 
denotation of the operators are the canonical ones. As for Section [4.6] L maps to 
L, so for instance _+,a = L. The denotation of bitvector is defined as the set 
of all bit-vectors, together with the “error” element L. The constant functions are 
mapped to their natural candidate definition, using -L as the default result when 
they should not be defined. So for instance, |bv_cst](—1,tt) = L. An element 
of ket is meant to be a ket-vector: we defined [ket] as the set of all possible 
ket-vectors > Qn|bn)m, for all possible m,n € N, a, € C and bit-vectors bn 
of size m, together with the error element L. Finally, pps is defined as the set of 
formal path-sums, as defined in Section together with the error element L. 
The denotation of the constant functions are defined as discussed in Section 
As an example, |pps_range] returns the range of the corresponding PPS. The 
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map circ_to_pps builds a valid PPS out of the input circuit, or L if the circuit 
is not valid. 

The defined PPS follows the structure of the circuit. For instance, as shown 
in Eq. on Page the PPS circ_to_pps(SEQ(C1,C2)) is the sequential 
composition of the two PPS circ_to_pps(C)) and circ_to_pps(C2). This kind 
of compositionality is what helps with automation. 


5.4 Regular sequents in QBRICKS-SPEC. Formulas in QBRICKS-SPEC are 
typed objects —and, as mentioned in Section [5.1Jone can identify them with first- 
order expressions of type bool. Due to this correspondence, we shall only say 
that logic judgments in QBRICKS-SPEC are well-formed judgments of the form 
At @¢ where the well-formedness means that A F @¢: bool is a valid typing 
judgment in QBRICKS-DSL. That being said, a well-formed judgment AF ¢ is 
valid whenever it holds in the denotational semantics: for every instantiation o 
sending x: A in A to | A], the denotation |¢], is valid. In particular, the (free) 
variables of ¢@ can be regarded as universally quantified by the context A. 


5.5 Parametricity of PPS. A regular path-sum is not parametric: it repre- 
sents one fixed functional. So why did we chose [pps] to be a set of path-sums? 
Let us consider an example. 


Example 3. Consider the motivating example of Section [3.1]and its instantiation 
in Example }1] on page The function main describes a family of circuits 
indexed by an integer parameter n. Now, consider the typing judgment 


h:pps,n: int F (main(n) > h) : bool. 


It can be regarded as a relation between PPS h and integers n, valid whenever h 
represents main(n). Technically, this relation is not quite the graph of a function 
(since several PPS might match the circuit main(n)). 


5.6 Standard matrix semantics and correctness of PPS semantics. 
Similarly to the type pps, QBRICKS is endowed with a (logical) type matrix to 
handle the matrix interpretation of circuits, together with various functions and 
relations to reason on it. In particular, QBRICKS features a function mat_get : 
matrix x int x int — complex, formalizing the access to a matrix element, and 
a function circ_to_mat : circ — matrix realizing the matrix corresponding to 
a circuit. We then formally show, within our framework (proven in Why3), that 
for any valid circuit C and ket k of length width(C), applying circ_to_pps(C) 
on k is equivalent to multiplying it by circ_to_mat(C): 


Theorem 1 (Soundness of PPS wrt matrix semantics). 


C : circ, k : ket + ket_length(k) = width(C) A valid(C) > 
apply_mat(circ_to_mat C, k) = pps_apply(circ_to_pps C, k) 
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6 Reasoning on Quantum Programs 


Thanks to the logic presented in Section it is possible to write QBRICKS- 
SPEC formulas and to express properties of terms of the restricted syntax of 
Section Provided that the regular sequents are simple enough, these can 
automatically be handled with the use of SMT solvers. 

In this section, we define a specific Hoare logic, Hybrid Hoare Logic (HQHL), 
to express pre- and post-conditions for arbitrary QBRICKS-DSL terms. We then 
discuss the validity of such judgments and explain how to decompose them into 
elementary, regular sequents (proof obligations). The claim —backed up by our 
experiments in Section is that the obtained sequents are in practice simple 
enough to be dealt with automatically. 

We do not present all HQHL rules here, but simply aim to give an intuition of 
how and why one can rely on an automated deductive system to derive QBRICKS- 
SPEC judgments. The complete set of HQHL rules is presented in [I3]. 


6.1 HQHL judgments. In order to be able to express program specifica- 
tions with pre- and post-conditions, we introduce Hybrid Quantum Hoare Logic 
(HQHL) sequents of the form A IF- {p}e{w} : A (we omit the type A when irrel- 
evant or clear). The formula ~ can make use of a reserved free variable result 
of type A. Such a sequent is then well-formed provided that A F @¢ : bool, 
A,result : AF w: bool and At e: A are valid typing judgments. Note how 
the reserved free variable result is being added to A for typing w. For conve- 
nience, as syntactic sugar we allow indexed variables result; to stand for the 
ith projection of a tuple. 

The validity of an HQHL sequent can be defined semantically, similarly to 
what was done in Section A I- {@}e{y} : A is valid whenever it is both 
well-formed and when for every instantiation ø sending x: A in A to [A] and 
sending result to Jel, the denotation |¢ > 7], is valid. 

In the following sections, we describe the deduction rules that we rely on 
in QBRICKS. They are designed to be used in a bottom-up strategy to break 
down judgments into pieces reasoning on smaller terms. Along the way, there 
is the need for introducing invariants and assertions. As usual, some of these 
assertions can be derived by computing the weakest-preconditions: we do not 
necessarily have to introduce every single one. When attaining a term of the 
restricted grammar of QBRICKS-SPEC that cannot be further decomposed, one 
can rely on the rule 

I H ọ — y|result := êl 
PF {9} e (uy: A 


to generate a proof obligation as a regular sequent in QBRICKS-SPEC. 


(f-o) 


6.2 Deduction rules for term constructs. Figure [10] presents the deduc- 
tion rules for the term constructs of QBRICKS-DSL carrying a computational 
content: iteration, tests, function evaluation, etc. We also present a standard 
weakening rule (weaken) and an example of rule for rewriting: The deduction 
rule (eq) states that whenever two expressions are equal one can substitute one 
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T,x i {ġA^zx <0} e2 {P[x,result]} [x,y lk {oAP[z, y]} f(y) {Plr+1, result]} 
IIF {@} iter f é1 e2 {Plé1, result] } (iter) 


Dlr {Phei:{Q[x; := resulti} T,z1,..., £n l- {Q}eo{R} 


IIF {P}let z1,..., £n =e1 ine2{R} 


(let) 


I I- {P}e:{Q[|z := result} T,xl- {QA z}e2{R} [alk {QA^nr}es{R} 


I I- {P}if e1 then e2|x := e1] else e3[x := e1 {R} pa 
Vi, I I- {P}e:{Ri|result]} i 
T IF {P}e1,...,e2)}{Ri[resultı] \---A Rn[resultn]} a 
P(w1y..+@m) Se Dik {Pheles = e1,-+.52n = en {RE (oy 
TlH {P}f(e1,..., en {R} (dec1) 
ine anaes CMe Ce a 


T lt {P}e{Q}: A 


Tre =e2:A TIH {Plei]} eer] {Q[ei]}: A 
T I- {Plea]} ele2] {Qlea]} : A 


(eq) 


Fig. 10: Deduction rules for QBRICKS: HQHL rules for term constructs 


for the other inside a HQHL judgment. Finally, we can derive from the seman- 
tics the usual substitution rules. For instance, provided that T,x : A F w and 
T Hê: Athen IF yje := êj. Note that in the rules, the first-order expressions 
of the form é are from the restricted grammar of terms of QBRICKS-SPEC. 


6.3 Deduction rules for pps. The main tools to relate circuits and PPS are 
the constant function circ_to_pps, its relational counterpart (— > —), and the 
declared function circ_apply. They can be specified inductively on the structure 
of the input circuit. The complete set of rules for circ_to_pps and (— > —) can 
be found in [I3]. 


Compositionality of SEQ. For instance, one can derive the deduction rules 
for circ_apply applied to SEQ from Table These rules can be used in a 
bottom-up manner to derive composable, elementary properties of circuits out 
of sub-circuits. In the table, we abbreviate pps_acc(circ_to_pps(—)) with Cac, 
for acc € {width, range, ket,angle} and, given two bit vectors x and y, x-y 
denotes their concatenation. 


Example of deduction rule for HAD. Using the notations from above, we 
define the following axiom for function circ_to_pps applied to the gate HAD: 
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a T IF {g}Ci{C_width(result, {p}) = w} 
Prec-SEQ Ê i (4\c,{¢ width(result, {p}) = w} 


I I- {@}SEQ(C1, C2){C_width(result, {p}) = w} 


w 


T IF {ġ1}C1{C_range(result, {p}) = ri({p})} 
T IF {ġ2}C2{C_range(result, {p}) = r2({p})} 


Dik {1 A b2}SEQ(C1, C2){C_range(result, {p}) = rı ({p}) + r2({p})} 


{Prec-SEQ} 
SEQ- 


T I- {¢1}C1{C_angle(result, {p})(x, y1) = aı({p}, x, y1)} 
T I- {¢1}C1{C_ket(result, {p}) (zx, y1) = ki({p}, z, y1)} 
DIF {$2 }C2{C_angle(result, {p})(ki({p}, x, y1), y2) 

= a2({p}, £, y1, y2)} 


{Prec-SEQ} 


TI- {¢1 A ġ2}SEQ(C1, C2){C_angle(result, {p})(x, y1 - y2) ona 
= a ({p}, £, y1) + a2({p}, £, y1, Y2)} 
I I- {$1 }Ci{C_ket (result, {p})(x, y1) = kı ({p}, x, y1)} 
{Prec-SEQ} Pr {o2}C2{C_ket (result, {p})(ki({p}, T, yı), ye) 
= ko({p}, £, yı -y2)} SEQ» 


TIF {ġ1 A d2}SEQ(C1, C2){C_ket(result, {p}) (x, y1 - yo) = ko({p}, z, yi - y2) 


Fig. 11: Deduction rules for circ_apply on sequence of circuits 


I, x,y : bitvector lF 
C_width(result) = 1, 
a E i} HAD C_range(result) = 1, 
bv_length(y) = 1 C_angle(result, x, y) = Tjo] * Yo); 
C_ket(result,z,y) = y 


Example 4. Consider the motivating example of Section B-IjJand its instantiation 
in Example [1] We can now give a specification to the function main, as follows: 


n: int, m : int,x :ketlF {n > 0Aket_length(z)=1An=2x*m} 
main(n) 
{circ_apply(result, x) =z}. 


The fact that circ_apply is well-defined implies that C is valid. 


6.4 Equational reasoning. The SMT solvers we aim at using to discharge 
proof obligations require equational theories describing how to reason on the con- 
stant functions that were introduced. Some of these equational theories, such as 
bit-vectors and algebraic fields, are standard and well-known in verification. To- 
gether with a few properties on square-root, exponentiation, real and imaginary 
parts, the latter is all we need for real and complex: in quantum computation, 
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the manipulations of real and complex numbers turn out to be quite limited — 
we do not need anything related to real or complex analysis. 

The main difficulty in the design of QBRICKS has been to lay out equational 
theories and lemmas for circ, pps and ket that can efficiently help in automa- 
tically discharging proof obligations. Many of these equations and lemmas are 
quite straightforward. For instance, we turn the rewriting rules of Table 8] into 
equations, such as (x,y : circ) F width(PAR(a, y)) = width(x) + width(y), or 
a: A,n : int F iters(a,n+1) = f(iter;s(a,n)). These equations maps the 
(syntactic) computational behavior of expressions into the logic. 

Other equations express purely semantic properties. For instance, 


I,k: ket + circ_apply(SEQ(C;, C2), k) = 
circ_apply(Cj, circ_apply(C2,k)) (3) 


(together with a few hypotheses ensuring correct widths) can be derived from 
Table [IT] and is part of the equational theory. 


6.5 Additional deductive rules. QBRICKS provides additional reasoning 
rules, that we do not have space enough to detail here. Upon them are: 


Circuit complexity. Certifying the complexity of quantum implementations 
(e.g., polynomial number of gates in the size of the input) is of primary impor- 
tance as in mid-term, implementations will have to deal with limited hardware 
capacities, hence the need for tight circuit constructions. We stress that, while 
raised by several programming [80] or compilation works [48], this aspect of cer- 
tification is not addressed by existing formal verification approaches [35/45[1]. 


Probabilities. The probability of obtaining a result by a measurement is corre- 
lated with the amplitudes of the corresponding ket-basis vectors in the quantum 
state of the memory. In QBRICKS-SPEC we define proba_partial_measure : 
circ x ket x bitvector — real meaning that when the input circuit is applied 
to the input ket, if we were to measure the result the probability of obtaining 
the given vector would be the result of the function. 


Wire identification. In some situation, to add a gate in a circuit it is easier to 
give the number (identifier) of the wire on which the gate applies (such as “apply 
HAD on wire n”) instead of sequencing the circuit with Id®"~! @ HAD. This is for 
instance the design chosen in QASM or SQIR [85]. 


In QBRICKS it is possible to define such a macro with the use of a derived 
constructor PLACE(C, k,n). For any circuit C and any integers k,n, if 0 < k < 
n—width(C), PLACE(C, k,n) applies C on wires k to k+width(C)-1. It is defined 
as ID®* @ C @ ID®"—F-C-width(C) where for any 0 < i, ID‘ = iter par-ID (i — 
1) ID and par-ID(C) = PAR(C, ID). Similarly, QBRICKS also provides constructor 
CONT(C,c, k,n) with additional index c in [0,n[ and not in [k,k + width(c)|. 
Using adequate qubit permutation, through combinations of PLACE and SWAP, it 
applies PLACE(C, k,n) with control c. 
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7 Implementation 


The framework described so far is implemented as a DSL embedded inside the 
Why3 deductive verification platform [9[25], written in the WhyML program- 
ming language. This allows us to benefit from several strengths of Why3, such as 
efficient code extraction toward Ocaml, generation of proof obligations (to im- 
plement the HQHL mechanism) and access to several proof means: SMT solvers, 
interactive proof commands or export to proof assistants (Coq, Isabelle/HOL) 
—although we do not use this latter option in our case-studies. 

The development itself counts 17,000+ lines of code, including 400+ defini- 
tions and 1700+ lemmas, all proved within Why3. Most of the development con- 
cerns the (verified) mathematical libraries. They cover the mathematical struc- 
tures at stake in quantum computing (complex numbers, Kronecker product, 
bit-vectors, etc.), together with a formally verified collection of mathematical 
results. Only two theorems are assumed (for any real x: if 0 < a < 1 then 
sin(nx) < mz, and x < sin(7$)). Proving them requires function derivation 
material, not available in Why3 so far. Hence we chose to assume these standard 
results. 


8 Case studies and experimental evaluation 


We develop and prove parametric implementations of Grover’s search, the Quan- 
tum Fourier Transform (QFT), the Quantum Phase Estimation (QPE) and the 
first ever verified implementation of the quantum part of Shor’s algorithm (Shor- 
OF). We also implemented Deutsch-Jozsa (DJ) for comparison. 


8.1 Examples of formal specifications. Let us first introduce some of the 
formal specifications we proved. The specification for QPE [4116] is shown in 
Figure Mka). The procedure inputs a unitary operator U and an eigenvector 
|v) of U and finds the ghost ([26]) eigenvalue e?"*?» associated with |v). The 
specification for Shor-OF [6I] is shown in Figure |12[b). We developed a certi- 
fied concrete implementation following the implementation proposed in [5] —a 
reference in term of complexity[] The specification for Grover [BI] is shown in 
Figure [12{c). Given a predicate with k true value in [0,2”[, Grover’s algorithm 
outputs one of these true values with good probability. 

Each of these specifications makes use of specific functions that we do not 
have the space to detail here (see [13] for details). We however want to note 
two things. First, these specifications describe results of measurement (with the 
dedicated functions proba_partial_measure_x). As discussed in Section 
if QBRICKS-DSL is not able to handle measurement we are still able with 
QBRICKS-SPEC to reason on the result of a measurement, as this is a simple 
function over complex amplitudes. Another thing to note is that, for Shor-OF 
and Grover, our specification discuss the polynomial size of the produced circuit. 


° A further refinement is possible [5], using a hybrid version of the Quantum Fourier 
Transform, but it would require adding effective measure operation and classical 
control to QBRICKS. 
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T, (f : pps), (C : circ), (|v) : ket), (k,n : int), (ghost 0: real), (j : ghost int) I- 
((Co f) A width(C)=n A 0<k A Eigen(f,|v),e?""*?)) 
QPE(C, k,n) 
proba_partial_measure_p(result, k|v),error < set) >A 
0 = 3; — proba_partial_measure(result, |v), |j)x) = 1 


(a) Specification for our implementation of Quantum Phase estimation 


I'(a,b,n: int), (j : ghost int) IF 
(co_prime(a,b)A 1<b<2"A1<j <b Aai%d=1) 
Shor-circ(a,r,7) 
proba_partial_measure_p(|l),,error; <_1  )>4A 


proba_partial_measure_p(|1)n,error2 < = a 
size(result) = Shor-poly(n) A 


ancillas(result) =n+2A width(result) = 3*n 


(b) Specification for our implementation of Shor-OF algorithm 


T,(C : circ), (f : int + bool), (n, i,k : int) I- 
implements(C, f) Al<n A1L<k<2"-1A1<i 
A Card({j |0 < j < 2” A f(j) = true}) = k 
Grover(C, k,n) 
proba_partial_measure,(result,bv_cst (n,0), f) = sin? (arcsin (v *) (1+ 2i)) A 
size(result) = i *(size(C) * O(n)) A 
width(result) = n ^ ancillas(result) = 1 


(c) Specification for our implementation of Grover’s algorithm 


Fig. 12: Specifications of the main implementations 


8.2 Experimental evaluation. Different metrics about our formal develop- 
ments are reported in Table lines of decorated code, number of lemmas, 
proof obligations (PO), automatically proven PO (within time limit 5 seconds) 
and their percentage among POs, interactive commands we entered to discharge 
them and time required for the automatic verification of these proofs. 

Note that metrics for each implementation strictly concern the code that is 
proper to it (eg., QPE contains calls to QFT but QPE line in Table [13]does not 
include the QFT implementation. The whole Shor-OF development is reported 
in the “Shor-OF full”. 


Result. QBRICKS did allow us to implement and verify in a parametric manner 
the Shor-OF, QPE and Grover algorithms, at a rather smooth cost and with high 
proof automation (95% on average, 95% for full Shor-OF). 


8.3 Prior verification efforts. Before comparing our approach to prior at- 
tempts (Table (14), we first introduce these cases. 


T Experiments were run on Linux, on a PC equipped with an Intel(R) Core(TM) 
i7-7820HQ 2.90GHz and 15 GB RAM. We used Why3 version 1.2.0 with solvers 
Alt-Ergo-2.2.0, CVC 3-2.4.1, CVC4-1.0, Z3-4.4.1. 
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LoC +|#Extr.)#Def.|4#Lem.|#POs| Automation |#4Cmd]| Verif. 

Spec # Aut.|% Aut. time 
DJ 57 11 2 1 72 61 | >84%| 39 |1m19s 
Grover 193 39 6 8 505 | 479 | >94% | 125 |4m43s 
QFT 65 18 3 0 62 53 | >85% | 37 |lmlis 
QPE 175 24 3 8 282 | 262 | >92%| 94 |4m35s 
Shor-OF 923 132 28 14 2473 | 2386 | >96% | 421 |<18m 
Shor-OF (full)} 1163 174 34 22 | 2817] 2701 | >95% | 552 |<23m 
Total 1423 224 42 31 3394 | 3241 | >95% | 716 |<29m 


#LoC + Spec.: lines of decorated code — # Extr.: lines of extracted code (OCaml) 
#Aut.: automatically proven POs — #Cmd: interactive commands 
#Verif. time: automated proof verification time 


Table 13: Implementation & verification for case studies with QBRICKS 


Regular path-sums. [PJI] uses path sums for the verification of several circuits 
of complexity similar to that of QFT (QFT, Hidden shift, generalized Toffoli, 
etc). Yet, these experiments consider fixed circuits (up to 100 qubits) and the 
technique cannot be applied to parametric families of circuits or circuit-building 
languages. 


QAL. Liu et al. [45] report about the parametric verification of Grover search 
algorithm, on a restricted case] and in the high-level algorithm description for- 
malism of QHL - especially QHL has no notion of circuit. So for instance one 
cannot reason upon the size of a circuit within QHL. 


SQIR. Finally, Hietala et al. [35] have presented a parametric (circuit-building) 
implementation of the Deutsch-Jozsa algorithm in Coq, with two independent 
full correctness proofs. Recently (Oct. 2020), the authors also presented para- 
metrized versions of QFT, QPE and Grover algorithms [84]. 


8.4 Evaluation: benefits of PPS and QBRICKS. So as to evaluate the 
proof effort gain of using pps instead of matrices, Table [14] shows some com- 
parison between our case studies implementations and equivalent proved imple- 
mentations from the literature: the Grover algorithm implementation from 
in Isabelle/HOL and the implementations [35]34] using SQIR and Coq. As sup- 
plementary comparison terms, we implemented QBRICKS versions of both QFT 
and Deutsch-Jozsa using exclusively matrices. 

For example the QBRICKS implementation of QFT with pps is 18 lines long, 
with 47 lines of specifications and intermediary lemmas, and its proof required 37 
additional interactive commands, hence Spec + Cmd = 84. In comparison, the 
corresponding SQIR development uses 287 interactive commands (7.7x more). 


Conclusion. Relying on PPS semantics and first-order logic instead of matri- 
ces and higher-order logics strongly eases the proof effort. In term of command 


8 The case in p. 232] concerns cases where the number k of seeked values is equal 
to 2) for a given integer j. 


An Automated Deductive Verification Framework for Quantum Programs 171 


QBRICKS pps QBRICKS Matrix 

LoC Spec Cmd _  Spec+Cmd|LoC Spec Cmd Spec+Cmd 
DJ 11 46 39 85 11 129 131(>3.3x) 260(>3x) 
QFT | 18 47 37 84 18 172 106(>2.8x) 278 (>3.3x) 
Grover| 39 154 125 279 
QPE 23 152 94 246 

SQIR QHL 

LoC Spec Cmd  Spec+Cmd|LoC Spec Cmd Spec+Cmd 
DJ 10 39 222(>5.6x) 261(>3x) 
QFT | 10 44 287(>7.7x) 331(>3.9x) 
Grover| 15 121 805(>6.4x) 926(>3.3x)} 90 1263 1712(>13.6x) 2975 (>10.6x) 
QPE | 40 86 726(>7.7x) 812(>3.3x) 

e 


#LoC.: lines of code — # Spec.: lines of spec. and lemmas — #Cmd: proof commands 


Table 14: Compared implementations of case studies, using matrices and pps 


lines, proofs are consistently at least 5.6x shorter than non QBRICKS examples, 
up to 13.6x for the case of Grover in QHL and 7.7x for QPE and QFT in SQIRP] 


9 Related works 


Formal verification of quantum circuits. Prior efforts regarding quantum 
circuit verification [27/457 70)53)56)1/2[35/34] have been described throughout the 
paper, especially in Sections [I] [3.1Jand B] Our technique is more automated than 
those based on interactive proving [3518445], borrows and extends the path sum 
representation [2] to the parametric case, and do consider a circuit-building 
language rather than a high-level algorithm description language [45]. 


Quantum Languages and Deductive Verification. Liu et al. intro- 
duce Quantum Hoare Logic for high-level description of quantum algorithms. 
QHL and our own HQHL are different, as the underlying formalisms have differ- 
ent focus. While QHL deals with measurement and classical control, it does not 
allow reasoning on the structure of the circuit. On the other hand, QBRICKS does 
not handle classical control, but it brings better proof automation and deduc- 
tion rules for reasoning on circuits. Combining the two approaches is an exciting 
research direction. 


Verified Circuit Optimizations. Formal methods and other program analy- 
sis techniques are also used in quantum compilation for verifying circuit optimi- 


zation techniques [526208162157185]. Epecially, the ZX-calculus [I7] represents 


? The difference with SQIR in the column “Spec+Cmd?” is less stringent. By the way, it 
turns out that SQIR syntax for specifications is often more succint, as eg, QBRICKS 
writes each precondition in a separated line, where Coq writes the same as a single- 
line conjunction. 
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quantum circuits by diagrams amenable to automatic simplification through de- 
dicated rewriting rules. This framework leads to a graphical proof assistant [40] 
geared at certifying the semantic equivalence between circuit diagrams, with ap- 
plication to circuit equivalence checking and certified circuit compilation and 
optimization . Yet, formal tools based on ZX-calculus are restricted to 
fixed circuits, and parametrized approaches are so far limited to pen-and-paper 


proofs [12]. 


Other quantum applications of formal methods. Huang et al. [B6187] 
proposes a “runtime-monitoring like” verification method for quantum circuits, 
with an annotation language restricted to structural properties of interest (e.g., 
superposition or entanglement). Similarly, [44] describes a projection based as- 
sertion language for quantum programs. Verification of these assertions is led 
by statistical testing instead of formal proofs. The recent Silq language [8] also 
represents an advance in the way toward automation in quantum programming. 
It automatizes uncomputation operations, enabling the programmer to abstract 
from low level implementation details. Also specialized type systems for quan- 
tum programming languages, based on linear logic [605943] and dependent 
types [51)53], have been developed to tackle the non-duplicability of qubits and 
structural circuit constraints. Finally, formal methods are also at stake for the 


verification of quantum cryptography protocols [49]29/11}47119]. 


10 Conclusion 


We address the problem of automating correctness proofs of quantum programs. 
While relying on the general framework of deductive verification, we finely tune 
our domain-specific circuit-building language QBRICKS-DSL together with its 
new logical specification language QBRICKS-SPEC in order to keep correctness 
reasoning over relevant quantum programs within first-order theory. Also, we 
introduce and intensively build upon parametrized path sums (PPS), a sym- 
bolic representation for quantum circuits represented as functions transforming 
quantum data registers. We develop verified parametric implementations of the 
Shor-OF algorithm (first verified implementation) and other famous non-trivial 
quantum algorithms (including QPE and Grover search), showing significant 
improvement over prior attempts — when available. 
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Abstract. Session types statically describe communication protocols 
between concurrent message-passing processes. Unfortunately, paramet- 
ric polymorphism even in its restricted prenex form is not fully under- 
stood in the context of session types. In this paper, we present the 
metatheory of session types extended with prenex polymorphism and, 
as a result, nested recursive datatypes. Remarkably, we prove that type 
equality is decidable by exhibiting a reduction to trace equivalence of de- 
terministic first-order grammars. Recognizing the high theoretical com- 
plexity of the latter, we also propose a novel type equality algorithm 
and prove its soundness. We observe that the algorithm is surprisingly 
efficient and, despite its incompleteness, sufficient for all our examples. 
We have implemented our ideas by extending the Rast programming 
language with nested session types. We conclude with several examples 
illustrating the expressivity of our enhanced type system. 


1 Introduction 


Session types express and enforce interaction protocols in message-passing sys- 
tems [29]44]. In this work, we focus on binary session types that describe bilateral 
protocols between two endpoint processes performing dual actions. Binary ses- 
sion types obtained a firm logical foundation since they were shown to be in a 
Curry-Howard correspondence with linear logic propositions [7J8J47]. This allows 
us to rely on properties of cut reduction to derive type safety properties such as 
progress (deadlock freedom) and preservation (session fidelity), which continue 
to hold even when extended to recursive types and processes |17]. 

However, the theory of session types is still missing a crucial piece: a general 
understanding of prenex (or ML-style) parametric polymorphism, encompass- 
ing recursively defined types, polymorphic type constructors, and nested types. 
We abbreviate the sum of these features simply as nested types [3]. Prior work 
has restricted itself to parametric polymorphism either: in prenex form with- 
out nested types [26/45]; with explicit higher-rank quantifiers [6/38] (including 
bounded ones [24]) but without general recursion; or in specialized form for iter- 
ation at the type level [46]. None of these allow a free, nested use of polymorphic 
type constructors combined with prenex polymorphism. 
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In this paper, we develop the metatheory of this rich language of nested ses- 
sion types. Nested types are reasonably well understood in the context of func- 
tional languages [332] and have a number of interesting applications [I10[28]37]. 
One difficult point is the interaction of nested types with polymorphic recursion 
and type inference [36|. By adopting bidirectional type-checking we avoid this 
particular set of problems altogether, at the cost of some additional verbosity. 
However, we have a new problem namely how to handle type equality (=) given 
that session type definitions are generally equirecursive and not generative. This 
means that even before we consider nesting, with the definitions 


listja] = @{nil : 1, cons : a @ list[a]} list’[a] = {nil : 1, cons : a Q list’ [a]} 


we have list[A] = list’[B] and also list[list’[A]] = list’ [list{B]] provided A = B. The 
reason is that both types specify the same communication behavior—only their 
name (which is irrelevant) is different. As the second of these equalities shows, 
deciding the equality of nested occurrences of type constructors is inescapable: 
allowing type constructors (which are necessary in many practical examples) 
means we also have to solve type equality for nested types. For example, the 
types Tree[a] and STree|a][x] represent binary trees and their faithfully (and 
efficiently) serialized form respectively. 


Tree[a] = @{node : Tree[a] © a ® Treela], leaf : 1} 
STree|[a, k] = @{nd : STree[a, a ® STreela, «]], 1f : K} 


We have that Tree[a] & «< is isomorphic to STree[a,«] and that the processes 
witnessing the isomorphism can be easily implemented (see Section J). 

At the core of type checking lies type equality. We show that we can translate 
type equality for nested session types to the trace equivalence problem for de- 
terministic first-order grammars, shown to be decidable by Jančar, albeit with 
doubly-exponential complexity [31]. Solomon [42] already proved a related con- 
nection between inductive type equality for nested types and language equality 
for deterministic pushdown automata (DPDA). The difference is that the stan- 
dard session type equality is defined coinductively, as a bisimulation, rather than 
via language equivalence [23|. This is because session types capture communi- 
cation behavior rather than the structure of closed values so a type such as 
R = {a : R} is not equal to the empty type E = @{}. The reason is that the 
former type can send infinitely many a’s while the latter cannot, and hence their 
communication behavior is different, implying that the types must be different. 
Interestingly, if we imagine a lazy functional language such as Haskell with non- 
generative recursive types, then R and E would also be different. In fact, nothing 
in our analysis of equirecursive nested types depends on linearity, just on the 
coinductive interpretation of types. Our key results, namely decidability of type 
equality and a practical algorithm for it, apply to lazy functional languages! 

The decision procedure for deterministic first-order grammars does not ap- 
pear to be directly suitable for implementation, in part due to its doubly- 
exponential complexity bound. Instead we develop an algorithm combining loop 
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detection [23] with instantiation [I8] and a special treatment of reflexivity. The 
algorithm is sound, but incomplete, and reports success, a counterexample, or an 
inconclusive outcome (which counts as failure). In our experience, the algorithm 
is surprisingly efficient and sufficient for all our examples. 

We have implemented nested session types and integrated them with the 
Rast language that is based on session types [L7[18J19]. We have evaluated our 
prototype on several examples such as the Dyck language [2I], an expression 
server [45] and serializing binary trees, and standard polymorphic data structures 
such as lists, stacks and queues. 

Most closely related to our work is context-free session types (CFSTs) [5]. 
CFSTs also enhance the expressive power of binary session types by extend- 
ing types with a notion of sequential composition of types. In connection with 
CFSTs, we identified a proper fragment of nested session types closed under 
sequential composition and therefore nested session types are strictly more ex- 
pressive than CFSTs. 

The main technical contributions of our work are: 


— A uniform language of session types supporting prenex polymorphism, type 
constructors, and nested types and its type safety proof (Sections (6). 

— A proof of decidability of type equality (Section [4}. 

— A practical algorithm for type equality and its soundness proof (Section 5p. 

— A proper fragment of nested session types that is closed under sequential 
composition, the main feature of context-free session types (Section 

— An implementation and integration with the Rast language (Section |8). 


2 Overview of Nested Session Types 


The main motivation for studying nested types is quite practical and generally 
applicable to programming languages with structural type systems. We start 
by applying parametric type constructors for a standard polymorphic queue 
data structure. We also demonstrate how the types can be made more precise 
using nesting. A natural consequence of having nested types is the ability to 
capture (communication) patterns characterized by context-free languages. As 
an illustration, we express the Dyck language of balanced parentheses and show 
how nested types are connected to DPDAs also. 


Queues A standard application of parameterized types is the definition of poly- 
morphic data structures such as lists, stacks, or queues. As a simple example, 
consider the nested type: 


queue[a] = &{ins : a — queue[a}, del : {none : 1, some : a Q queue[a}}} 


The type queue, parameterized by a, represents a queue with values of type a. 
A process providing this type offers an external choice (&) enabling the client 
to either insert a value of type a in the queue (label ins), or to delete a value 
from the queue (label del). After receiving label ins, the provider expects to 
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receive a value of type a (the — operator) and then proceeds to offer queue[a]. 
Upon reception of the label del, the provider queue is either empty, in which 
case it sends the label none and terminates the session (as prescribed by type 
1), or is non-empty, in which case it sends a value of type a (the ® operator) 
and recurses with queue[a]. 

Although parameterized type definitions are sufficient to express the standard 
interface to polymorphic data structures, we propose nested session types which 
are considerably more expressive. For instance, we can use type parameters to 
track the number of elements in the queue in its type! 


queuela, z] = &{ins : a — queuela, Somefa, x]], del : x} 


2 {some : a Q queue[a, x] } None = @{none : 1} 


Some[a, x] 
The second type parameter x tracks the number of elements. This parameter 
can be understood as a symbol stack. On inserting an element, we recurse to 
queue|[a, Some[a, x]] denoting the push of Some symbol on stack x. We initiate the 
empty queue with the type queue[a, None] where the second parameter denotes 
an empty symbol stack. Thus, a queue with n elements would have the type 
queue[a, Some” [a, None]]. On receipt of the del label, the type transitions to x 
which can either be None (if the queue is empty) or Some[a, x] (if the queue 
is non-empty). In the latter case, the type sends label some followed by an 
element, and transitions to queue[a, x] denoting a pop from the symbol stack. 
In the former case, the type sends the label none and terminates. Both these 
behaviors are reflected in the definitions of types Some and None. 


Context-Free Languages Recursive session types capture the class of regular 
languages [45]. However, in practice, many useful languages are beyond regular. 
As an illustration, suppose we would like to express a balanced parentheses 
language, also known as the Dyck language with the end-marker $. We use 
L to denote an opening symbol, and R to denote a closing symbol (in a session- 
typed mindset, L can represent client request and R is server response). We 
need to enforce that each L has a corresponding closing R and they are properly 
nested. To express this, we need to track the number of L’s in the output with 
the session type. However, this notion of memory is beyond the expressive power 
of regular languages, so mere recursive session types will not suffice. 
We utilize the expressive power of nested types to express this behavior. 


T|z] = {L : T[T[z]],R: x} D+ @{L: T[D],$:1} 


The nested type T[z] takes x as a type parameter and either outputs L and 
continues with T[T[2]], or outputs R and continues with x. The type D either 
outputs L and continues with T[D], or outputs $ and terminates. The type D 
expresses a Dyck word with end-marker $ [34]. 

The key idea here is that the number of T’s in the type of a word tracks the 
number of unmatched L’s in it. Whenever the type T[z] outputs L, it recurses 
with T[T[2]] incrementing the number of T’s in the type by 1. Dually, whenever 
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the type outputs R, it recurses with x decrementing the number of T’s in the type 
by 1. The type D denotes a balanced word with no unmatched L’s. Moreover, 
since we can only output $ (or L) at the type D and not R, we obtain the 
invariant that any word of type D must be balanced. If we imagine the parameter 
x as the symbol stack, outputting an L pushes T on the stack, while outputting R 
pops T from the stack. The definition of D ensures that once an L is outputted, 
the symbol stack is initialized with T|D] indicating one unmatched L. 

Nested session types do not restrict communication so that the words repre- 
sented have to be balanced. To this end, the type D’ can model the cropped Dyck 
language, where unbalanced words can be captured. 


T' [2] = O{L: T'[T' |z], R : 2,$:1} DS {L : T'[D'],$: 1} 


The only difference between types T[z] and T’[z] is that T'[x] allows us to 
terminate at any point using the $ label which immediately transitions to type 1. 
Nested session types can not only capture the class of deterministic context-free 
languages recognized by DPDAs that accept by empty stack (balanced words), 
but also the class of deterministic context-free languages recognized by DPDAs 
that accept by final state (cropped words). 


Multiple Kinds of Parentheses We can use nested types to express more 
general words with different kinds of parentheses. Let L and L’ denote two kinds 
of opening symbols, while R and R’ denote their corresponding closing symbols 
respectively. We define the session types 


S|z] + @{L: S[S[z]],L’ : S’[S[z]],R : x} 
S'[z] = @{L : S[S[x]], L’ : S’[S’[z]], R! : x} 
E ê {L : S[E], L’ : S'[E],$: 1} 


We push symbols S and S’ to the stack on outputting L and L’ respectively. 
Dually, we pop S and S’ from the stack on outputting R and R’ respectively. 
Then, the type E defines an empty stack, thereby representing a balanced Dyck 
word. This technique can be generalized to any number of kinds of brackets. 


Multiple States as Multiple Parameters Using defined type names with 
multiple type parameters, we enable types to capture the language of DPDAs 
with several states. Consider the language L3 = {L"aR"aUL"bR"b |n > 0}, 
proposed by Korenjak and Hopcroft [34]. A word in this language starts with 
a sequence of opening symbols L, followed by an intermediate symbol, either a 
or b. Then, the word contains as many closing symbols R as there were Ls and 
terminates with the symbol a or b matching the intermediate symbol. 


U £ {L : O[C[A], C[B]]}} O[z,y] * S{L : O[C{a], Cly]],a : 2, b : y} 
C[z] ê S{R : £} A+ @{a:1} B+ 6@{b: 1} 


The L3 language is characterized by session type U. Since the type U is unaware 
of which intermediate symbol among a or b would eventually be chosen, it 
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cleverly maintains two symbol stacks in the two type parameters x and y of O. 
We initiate type U with outputting L and transitioning to O[C[A], C[B]] where 
the symbol C tracks that we have outputted one L. The types A and B represent 
the intermediate symbols that might be used in the future. The type O[z, y] can 
either output an L and transition to O[C[z],C[y]] pushing the symbol C onto 
both stacks; or it can output a (or b) and transition to the first (resp. second) 
type parameter x (resp. y). Intuitively, the type parameter x would have the 
form C”[A] for n > 0 (resp. y would be C”[B]). Then, the type Cla] would 
output an R and pop the symbol C from the stack by transitioning to x. Once 
all the closing symbols have been outputted (note that you cannot terminate pre- 
emptively), we transition to type A or B depending on the intermediate symbol 
chosen. Type A outputs a and terminates, and similarly, type B outputs b and 
terminates. Thus, we simulate the L3 language (not possible with context-free 
session types [45]) using two type parameters. 

More broadly, nested types can neatly capture complex server-client inter- 
actions. For instance, client requests can be captured using labels L, L’ while 
server responses can be captured using labels R, R’ expressing multiple kinds of 
requests. Balanced words will then represent that all requests have been handled. 
The types can also guarantee that responses do not exceed requests. 


3 Description of Types 


The underlying base system of session types is derived from a Curry-Howard 
interpretation of intuitionistic linear logic [25]. Below we describe the session 
types, their operational interpretation and the continuation type. 


A,B,C ::= @{€:Ag}eex send label k € L continue at type Ak 
| &{€:Agheer receive label k € L continue at type Ak 
| AQB send channel a: A continue at type B 
| A-oB receive channel a: A continue at type B 
| 1 send close message no continuation 
| a type variable 
| VIB] defined type name 


The basic type operators have the usual interpretation: the internal choice 
operator @{f: Ae beer selects a branch with label k € L with corresponding 
continuation type Ap; the external choice operator &{¢: Ag}eer offers a choice 
with labels 4 € L with corresponding continuation types Ag; the tensor operator 
AQ B represents the channel passing type that consists of sending a channel of 
type A and proceeding with type B; dually, the lolli operator A — B consists 
of receiving a channel of type A and continuing with type B; the terminated 
session 1 is the operator that closes the session. 

We also support type constructors to define new type names. A type name 
V is defined according to a type definition Va] = A that is parameterized by a 
sequence of distinct type variables @ that the type A can refer to. We can use 
type names in a type expression using V[B]. Type expressions can also refer to 
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parameter a available in scope. The free variables in type A refer to the set of 
type variables that occur freely in A. Types without any free variables are called 
closed types. We call any type not of the form V[B] to be structural. 

All type definitions are stored in a finite global signature X defined as 


Signature X ::=- | X, V[@] = A 


In a valid signature, all definitions V[a] = A are contractive, meaning that A 
is structural, i.e. not itself a type name. This allows us to take an equirecursive 
view of type definitions, which means that unfolding a type definition does not 
require communication. More concretely, the type V [B] is considered equivalent 
to its unfolding A[B/a]. We can easily adapt our definitions to an isorecursive 
view with explicit unfold messages. All type names V occurring in a valid 
signature must be defined, and all type variables defined in a valid definition 
must be distinct. Furthermore, for a valid definition V [a] = A, the free variables 
occurring in A must be contained in @. This top-level scoping of all type variables 
is what we call the prenex form of polymorphism. 


4 Type Equality 


Central to any practical type checking algorithm is type equality. In our system, 
it is necessary for the rule of identity (forwarding) and process spawn, as well as 
the channel-passing constructs for types A® B and A —o B. However, with nested 
polymorphic recursion, checking equality becomes challenging. We first develop 
the underlying theory of equality providing its definition, and then establish its 
reduction to checking trace equivalence of deterministic first-order grammars. 


4.1 Type Equality Definition 


Intuitively, two types are equal if they permit exactly the same communica- 
tion behavior. Formally, type equality is captured using a coinductive definition 
following seminal work by Gay and Hole [23]. 


Definition 1. We first define unfolds (A) as 


Vil=Acr f A#VEB] 
unfolds,(V[B]) = A[B/a] ~~ unfoldy(A) =A 


Unfolding a structural type simply returns A. Since type definitions are con- 
tractive [23], the result of unfolding is never a type name application and it 
always terminates in one step. 


Definition 2. Let Type be the set of closed type expressions (no free variables). 
A relation R C Type x Type is a type bisimulation if (A, B) E€ R implies: 


— If unfolds(A) = {£4 : Agheer, then unfolds(B) = {L : Bebcer and also 
(Ag, Be) ER for all lE L. 
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— If unfolds(A) = &{L : Aghecr, then unfolds(B) = &{f: Bebecer and also 
(Ae, Be) ER for all 2 EL. 

— If unfolds;(A) = A; ® Ag, then unfolds(B) = Bı ® B2 and (Ai, Bi) E R and 
(A2, Bg) ER. 

— If unfolds(A) = A; — Ag, then unfolds(B) = Bı — Bə and (A1, Bı) ER 
and (Ao, Bo) E R. 

— If unfolds (A) = 1, then unfolds (B) = 1. 


Definition 3. Two closed types A and B are equal (A = B) iff there exists a 
type bisimulation R such that (A,B) ER. 


When the signature X is not clear from context we add a subscript, A =» B. 
This definition only applies to types with no free type variables. Since we allow 
parameters in type definitions, we need to define equality in the presence of 
free type variables. To this end, we define the notation VV. A = B where VY is 
a collection of type variables and A and B are valid types w.r.t. V (i.e., free 
variables in A and B are contained in V). 


Definition 4. We define YV. A = B iff for all closed type substitutions o : V, 
we have Alo] = Bio]. 


4.2 Decidability of Type Equality 


Solomon [42] proved that types defined using parametric type definitions with 
an inductive interpretation can be translated to DPDAs, thus reducing type 
equality to language equality on DPDAs. However, our type definitions have a 
coinductive interpretation. As an example, consider the types A = @{a: A} and 
B = {b : B}. With an inductive interpretation, types A and B are empty 
(because they do not have terminating symbols) and, thus, are equal. However, 
with a coinductive interpretation, type A will send an infinite number of a’s, 
and B will send an infinite number of b’s, and are thus not equal. Our reduction 
needs to account for this coinductive behavior. 

We show that type equality of nested session types is decidable via a reduction 
to the trace equivalence problem of deterministic first-order grammars [30]. A 
first-order grammar is a structure (N, A, S) where M is a set of non-terminals, 
A is a finite set of actions, and S is a finite set of production rules. The arity 
of non-terminal X € M is written as arity(X) € N. Production rules rely on a 
countable set of variables V, and on the set Ty of regular terms over N U V. A 
term is regular if the set of subterms is finite (see [30]). 

Each production rule has the form Xa “> E where X € M is a non-terminal, 
a € Ais an action, and @ € Y* are variables that the term E € Tw can refer 
to. A grammar is deterministic if for each pair of X € N and a € A, there 
is at most one rule of the form Xa “> E in S. The substitution of terms B 
for variables @ in a rule XG “4 E, denoted by XB 5 E[B/al, is the rule 
(Xa > E)[B/a]. Given a set of rules S, the trace of a term T is defined as 


traces(T) = {a € A* | (T 4 7") € S, for some T’ € Ty}. Two terms are trace 
equivalent, written as T ~s T', if traces(T) = traces(T’). 
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The crux of the reduction lies in the observation that session types can be 
translated to terms and type definitions can be translated to production rules of a 
first-order grammar. We start the translation of nested session types to grammars 
by first making an initial pass over the signature and introducing fresh internal 
names such that the new type definitions alternate between structural (except 
1 and qa) and non-structural types. These internal names are parameterized 
over their free type variables, and their definitions are added to the signature. 
This internal renaming simplifies the next step where we translate this extended 
signature to grammar production rules. 


Example 1. As a running example, consider the queue type from Section [2] 
Qla] = &{ins : a — Q[a], del : {none : 1,some : a 8 Q[a}}} 


After performing internal renaming for this type, we obtain the following 
signature: 


Qla] = &{ins : Xoja], del : Xıfa]} Xıla] = @{none: 1,some : X2[a}} 
Xola] = a — Q[a] X2[a] = a 8 Qla] 


We introduce the fresh internal names Xo, Xı and Xə (parameterized with free 
variable a) to represent the continuation type in each case. Note the alternation 
between structural and non-structural types (of the form V[B]). 

Next, we translate this extended signature to the grammar G = (N,A,S) 


aimed at reproducing the behavior prescribed by the types as grammar actions. 


N= {Q, Xo, X1, X2, L} 
A = {&ins, &del, —1, —2 Snone, Gsome, 81, Q2, } 


&i &del 
S = {Qa = Xoa, Qa > Xia, Xoa => a, Xoa => Qa, 


@®none 


® @ @ 
Xa 19 1, Xa #9 Xa, Xe > a, Xa Qa} 


Essentially, each defined type name is translated to a fresh non-terminal. Each 
type definition then corresponds a sequence of rules: one for each possible con- 
tinuation type with the appropriate label that leads to that continuation. For 
instance, the type Q[a] has two possible continuations: transition to Xo[a] with 
action &ins or to X,[a] with action &del. The rules for all other type names is 
analogous. When the continuation is 1, we transition to the nullary non-terminal 
L disabling any further action. When the continuation is a, we transition to a. 
Since each type name is defined once, the produced grammar is deterministic. 

Formally, the translation from an (extended) signature to a grammar is han- 
dled by two simultaneous tasks: translating type definitions into production rules 
(function 7 below), and converting type names, variables and the terminated ses- 
sion into grammar terms (function (-)). The function (-) : OType > Ty from 
open session types to grammar terms is defined by: 


(1) = L type 1 translates to L 
(a) =a type variables translate to themselves 
(V[Bi,...,Bn]) = V (B1) --- (Bn) type names translate to first-order terms 
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Due to this mapping, throughout this section we will use type names indistinctly 
as type names or as non-terminal first-order symbols. 

The function 7 converts a type definition V[a] = A into a set of production 
rules and is defined according to the structure of A as follows: 


7(V[a] = @{0: Ajer) = {Va > (Ad) | £ € £} 

7(V[a] = &{t: Acfecr) = {(V[al) =5 (Ae) | 2€ L} 
7(V[a] = A 8 Aa) = {(V[a]) = (Ad [i= 1,2} 
7(V [a] = A; — Ap) = {(V[a]) => (Ai) | i = 1,2} 


The function 7 identifies the actions and continuation types corresponding to 
A and translates them to grammar rules. Internal and external choices lead to 
actions £ and &é, for each Z € L, with Ag as the continuation type. The type 
A; ® A» enables two possible actions, ®; and &2, with continuation A, and 
Ag respectively. Similarly A; — Ag produces the actions —o,; and —o> with A, 
and A» as respective continuations. Contractiveness ensures that there are no 
definitions of the form V{a] = V'[B]. Our internal renaming ensures that we 
do not encounter cases of the form V[a] = 1 or V[@] = a because we do not 
generate internal names for them. For the same reason, the (-) function is only 
defined on the complement types 1, a and V[B]. 

The 7 function is extended to translate a signature by being applied point- 
wise. Formally, 7(2’) = Uvyayeayes T(V[@] = A). Connecting all pieces, we 
define the fog function that translates a signature to a grammar as: 


fog( X) = (N, A, S), where: S=7(5) 
N ={X | (Xa 5 E)er(Z)} A= {a| (Xa E)e7(Z)} 


The grammar is constructed by first computing 7 (5) to obtain all the production 
rules. M and A are constructed by collecting the set of non-terminals and actions 
from these rules. The finite representation of session types and uniqueness of 
definitions ensure that fog(X) is a deterministic first-order grammar. 

Checking equality of types A and B given signature X finally reduces to 
(i) internal renaming of X to produce X’, and (ii) checking trace-equivalence of 
terms (A) and (B) given grammar fog( X"). If A and B are themselves structural, 
we generate internal names for them also during the internal renaming process. 
Since we assume an equirecursive and non-generative view of types, it is easy 
to show that internal renaming does not alter the communication behavior of 
types and preserves type equality. Formally, A=» B iff A =x B 


Theorem 1. A =s B if and only if (A) ~s (B), where (N,A,S) = fog(X”) 
and 3” is the extended signature for X. 


Proof. For the direct implication, assume that (A) 7s (B). Pick a sequence of 
actions in the difference of the traces and let wo be its greatest prefix occurring 
in both traces. Either wo is a maximal trace for one of the terms, or we have 
(A) #5 (A’) and (B) > (B’), with (A’) “+ (A”) and (B’) “> (B”), where 
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a, # az. In both cases, with a simple case analysis on the definition of the 
translation 7, we conclude that A’ # B’ and so A Æ B. For the reciprocal 
implication, assume that (A) ~s (B). Consider the relation 


R = {(Ao, Bo) | traces((Ao)) = traces ((Bo))} C Type x Type. 


Obviously, (A, B) € R. To prove that R is a type bisimulation, let (Ap, Bo) € 
R and proceed by case analysis on Ap and Bo. For the case unfolds (Ao) = 
@{e: Achkeer, we have (Ao) a (Ag). Since, by hypothesis, the traces coincide, 
races((Ao)) = traces ((Bo)), we have (Bo) 2, (B) and, thus, unfolds (Bo) = 
@{£4: Be}eet. Moreover, Jančar [30] proves that traces((Av))) = traces ((Be)). 
Hence, (Av, Be) E€ R. The other cases and a detailed proof can be found in [I4]. 


However, type equality is not only restricted to closed types (see|Definition 4). 
To decide equality for open types, i.e. VV. A = B given signature X, we introduce 


a fresh label Za and type Aa for each a € V. We extend the signature with type 
definitions: X* = X Usev {Aa = O{la: Aa}}. We then replace all occurrences 
of a in A and B with Aa and check their equality with signature X*. We prove 
that this substitution preserves equality. 


Theorem 2. VV. A =p B iff Alo*] =s- Blo*] where o*(a) = Aa for alla € V. 


er 


Proof (Sketch). The direct implication is trivial since o* is a closed substitution. 
Reciprocally, we assume that VV. A #y B. Then there must exist some substi- 
tution o’ such that Alo’] #> B[o’]. We use this constraint to and prove that 
Alo*] s+ B[o*]. The exact details can be found in our tech report [14]. 


Theorem 3. Checking VV. A = B is decidable. 


Proof. reduces equality of open types to equality of closed types. 
reduces equality of closed nested session types to trace equivalence 
of first-order grammars. Jančar proved that trace equivalence for first-order 


grammars is decidable, hence establishing the decidability of equality for nested 
session types. 


5 Practical Algorithm for Type Equality 


Although type equality can be reduced to trace equivalence for first-order gram- 
mars (Theorem 1| and [Theorem 2h, the latter problem has a very high theoret- 
ical complexity with no known practical algorithm [30|. In response, we have 
designed a coinductive algorithm for approximating type equality. Taking in- 
spiration from Gay and Hole [23], we attempt to construct a bisimulation. Our 
proposed algorithm is sound but incomplete and can terminate in three states: 
(i) types are proved equal by constructing a bisimulation, (ii) counterexample 
detected by identifying a position where types differ, or (iii) terminated without 
a conclusive answer due to incompleteness. We interpret both (ii) and (tii) as a 
failure of type-checking (but there is a recourse; see Section 5.1}. The algorithm 
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V; TF A= Be (vé € L) , V; TFA = Be (vE L) 
V; TE {L: Acheer = DLL: Belcer i V; TF &{L: Aeyeer = &{L: Besoer 
V; r r FAZ=B Vad AS V; TFA =B V; rF A= Bo 
V: TEA, 9 A= Bı Q Bo Vi TF A — A= Bı — Be ° 
wey v: PbkA=a 
—————- 1 m var — — refl 
V; rH1=1 V;rFa=a V; r FA VJA] = V[A'] 


Vilal=AED Vlesl=BeD C=(V; nA] = VA) 
V; T,C F» AA /a] = BA fee] 


— — expd 
y ; T Fy Vi [Al] = V2[A2] 
(V'; VA] = Va[A§]) Er 
Jo’: Vv’. (v ; TIE nA] = Vail] AV; Plt vo] = Ve(Aa]) m 


V; DE V,[Ai] = V[A2] 


Fig. 1. Algorithmic Rules for Type Equality 


is deterministic (no backtracking) and the implementation is quite efficient in 
practice. For all our examples, type checking is instantaneous (see Section B). 

The fundamental operation in the equality algorithm is loop detection where 
we determine if we have already added an equation A = B to the bisimulation 
we are constructing. Due to the presence of open types with free type variables, 
determining if we have considered an equation already becomes a difficult opera- 
tion. To that purpose, we make an initial pass over the given types and introduce 
fresh internal names as described in [Example 1| (but also for 1 and æ for simplic- 
ity). In the resulting signature defined type names and structural types alternate 
and we can perform loop detection entirely on defined type names (whether in- 
ternal or external). The formal rules for this internal renaming are described in 
the technical report [14]. 

Based on the invariants established by internal names, the algorithm only 
needs to alternately compare two type names or two structural types. The rules 
are shown in The judgment has the form V ; I’ Fs A = B where 
Y contains the free type variables in the types A and B, » is a fixed valid 
signature containing type definitions of the form V [@] = C, and T is a collection 
of closures (V ; Vi[A1] = V2[A2]). If a derivation can be constructed, all closed 
instances of all closures are included in the resulting bisimulation (see the proof 
of{Theorem 4). A closed instance of closure (V ; Vi[A1] = V2[A2]} is obtained by 
applying a closed substitution o over variables in V, i.e., Vi[Ai[o]] = V2[A2[o]] 
such that the types V|[A,[o]] and V2[A2[o]] have no free type variables. Because 
the signature X is fixed, we elide it from the rules in [Figure 1] 

In the type equality algorithm, the rules for type operators simply compare 
the components. If the type constructors (or the label sets in the $ and & rules) 
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do not match, then type equality fails having constructed a counterexample to 
bisimulation. Similarly, two type variables are considered equal iff they have the 
same name, as exemplified by the var rule. 


The rule of reflexivity is needed explicitly here (but not in the version of Gay 
and Hole) due to the incompleteness of the algorithm: we may otherwise fail to 
recognize type names parameterized with equal types as equal. Note that the 
refl rule checks a sequence of types in the premise. 


Now we come to the key rules, expd and def. In the expd rule we expand the 


definitions of Vi [A] and V2[A9], and add the closure (V ; Vi[Ai] = V2[A2]) to 


I’. Since the equality of Vi[Ai] and V2[A2] must hold for all its closed instances, 
the extension of I” with the corresponding closure remembers exactly that. 


The def rule only applies when there already exists a closure in I’ with the 
same type names V; and Və. In that case, we try to find a substitution o’ over V’ 
such that Vı[Aı] is equal to V;[A4[o’]] and V2[A9] is equal to V2[A5[o’]]. Imme- 
diately after, the refl rule applies and recursively calls the equality algorithm on 
both type parameters. The substitution o’ is computed by a standard matching 
algorithm on first-order terms (which is linear-time), applied on the syntactic 
structure of the types. Existence of such a substitution ensures that any closed in- 
stance of (V ; Vi[Ai] = V2[A9]) is also a closed instance of (V' ; V1 [A1] = Va[A5]), 
which are already present in the constructed type bisimulation, and we can ter- 
minate our equality check, having successfully detected a loop. 


The algorithm so far is sound, but potentially non-terminating. There are 
two points of non-termination: (i) when encountering name/name equations, we 
can use the expd rule indefinitely, and (ii) we call the type equality recursively 
in the def rule. To ensure termination in the former case, we restrict the expd 
rule so that for any pair of type names V; and Vz there is an upper bound on 
the number of closures of the form (— ; Vi[—] = V2[—]) allowed in I’. We define 
this upper bound as the depth bound of the algorithm and allow the program- 
mer to specify this depth bound. Surprisingly, a depth bound of 1 suffices for all 
of our examples. In the latter case, instead of calling the general type equality 
algorithm, we introduce the notion of rigid equality, denoted by V ; PIF A=B. 
The only difference between general and rigid equality is that we cannot employ 
the expd rule for rigid equality. Since the size of the types reduce in all equality 
rules except for expd, this algorithm terminates. When comparing two instanti- 
ated type names, our algorithm first tries reflexivity, then tries to close a loop 
with def, and only if neither of these is applicable or fails do we expand the 
definitions with the expd rule. Note that if type names have no parameters, our 
algorithm specializes to Gay and Hole’s (with the small optimizations of reflex- 
ivity and internal naming), which means our algorithm is sound and complete 
on monomorphic types. 


Soundness. We establish the soundness of the equality algorithm by construct- 
ing a type bisimulation from a derivation of V ; [+ A = B by (i) collecting the 
conclusions of all the sequents, and (ii) forming all closed instances from them. 
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Definition 5. Given a derivation D of V; IT- A= B, we define the set S(D) 
of closures. For each sequent (regular or rigid) of the form V; TH A=BinD, 
we include the closure (V ; A = B} in S(D). 


Theorem 4 (Soundness). If V ; -+ A= B, then VV.A = B. Consequently, 
if V is empty, we get A= B. 


Proof. Given a derivation Do of Vo ; - F Ao = Bo, construct S(Do) and define 
relation Ro as follows: 


Ro = {(A[o], Blo]) | (V ; A = B) € S(Do) and o over V} 
Then, construct R; (i > 1) as follows: 
Ri = {(V[A], V[B]) | Via] = C € X and (Af, B’) € Ry_1 Vj € 1../a]} 


Consider R to be the reflexive transitive closure of U;.9 Ri. Note that extending 
a relation by its reflexive transitive closure preserves its bisimulation properties 
since the bisimulation is strong. If R is a type bisimulation, then our theorem 
follows since the closure (Vo ; Ao = Bo) € S(Do), and hence, for any closed 
substitution o, (Ao[o], Bo[a]) E R. 

All that remains is to prove that R is a type bisimulation. We achive this via 
a case analysis on the rule that added a pair (A, B) to R. The complete proof 
is described in the technical report |14]. 


5.1 Type Equality Declarations 


One of the primary sources of incompleteness in our algorithm is its inability to 
generalize the coinductive hypothesis. As an illustration, consider the following 
two types D and D’, which only differ in the names, but have the same structure. 


T(z] = O{L: T[T[z]],R: x} D4 @{L:T[D],$: 1} 
T(z] = @{L: T'[T"|zx]], R : x} D' 4 {L : T'[D'],$:1} 


To establish D = D’, our algorithm explores the L branch and checks T[D] = 
T'[D']. A corresponding closure (- ; TD] = T'[D']} is added to I’, and our algo- 
rithm then checks T|T[D]] = T’|T’|D’]]. This process repeats until it exceeds the 
depth bound and terminates with an inconclusive answer. What the algorithm 
never realizes is that T[z] = T’[z] for all x € Type; it fails to generalize to this 
hypothesis and is always inserting closed equality constraints to I. 

To allow a recourse, we permit the programmer to declare (concrete syntax) 


eqtype T[x] = T’ [x] 


an equality constraint easily verified by our algorithm. We then seed the I’ in 
the equality algorithm with the corresponding closure from the eqtype constraint 
which can then be used to establish D = D’ 


-; (x; Tz] =T"[z])/ D = D' 
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which, upon exploring the L branch reduces to 
n; (z; Tie] = T'I), ©; D= D’) ATID] = TD] 


which holds under the substitution [D/x] as required by the def rule. 

In the implementation, we first collect all the eqtype declarations in the pro- 
gram into a global set of closures Ip. We then validate every eqtype declaration 
by checking V ; Io F A = B for every pair (A, B) (with free variables V) in 
the eqtype declarations. Essentially, this ensures that all equality declarations 
are valid w.r.t. each other. Finally, all equality checks are then performed under 
this more general I. The soundness of this approach can be proved with the 
following more general theorem. 


Theorem 5 (Seeded Soundness). For a valid set of eqtype declarations Ip, 
if V; IoF A= RB, then YV.A =B. 


Our soundness proof can easily be modified to accommodate this require- 
ment. Intuitively, since I is valid, all closed instances of Io are already proven 
to be bisimilar. Thus, all properties of a type bisimulation are still preserved if 
all closed instances of I are added to it. 

One final note on the rule of reflexivity: a type name may not actually depend 
on its parameter. As a simple example, we have Via] = 1; a more complicated 
one would be V[a] = @{a: V[V[a]],b: 1}. When applying reflexivity, we would 
like to conclude that V[A] = V[B] regardless of A and B. This could be easily 
established with an equality type declaration eqtype V[a] = V[G]. In order to 
avoid this syntactic overhead for the programmer, we determine for each pa- 
rameter œ of each type name V whether its definition is nonvariant in a. This 
information is recorded in the signature and used when applying the reflexivity 
rule by ignoring nonvariant arguments. 


6 Formal Language Description 


In this section, we present the program constructs we have designed to real- 
ize nested polymorphism which have also been integrated with the Rast lan- 
guage to support general-purpose programming. The underlying base 
system of session types is derived from a Curry-Howard interpretation [7]8] of 
intuitionistic linear logic [25]. The key idea is that an intuitionistic linear sequent 
A; Ap ... A, | A is interpreted as the interface to a process P. We label each 
of the antecedents with a channel name x; and the succedent with channel name 
z. The «,’s are channels used by P and z is the channel provided by P. 


(xı : Ay) (£2 : Ag)... (£n : An) F Ps: (z: C) 


The resulting judgment formally states that process P provides a service of ses- 
sion type C along channel z, while using the services of session types A1,..., An 
provided along channels £1,..., £n respectively. All these channels must be dis- 
tinct. We abbreviate the antecedent of the sequent by A. 

Due to the presence of type variables, the formal typing judgment is extended 
with VY and written as 
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Type Cont. Process Term Cont. Description 
c: @{€: Ackeer c: Ak ck; P P send label k on c 
case c (L > Qejeer Qk receive label k on c 
c: &{0: Ackeer c: Ak case c (L => Pijeer Pk receive label k on c 
ck; Q Q send label k on c 
c:AQ@B c:B sendcw; P P send channel w : A on c 
ye recv c; Qy  Qy[w/y] receive channel w : A on c 
c: A— B c:B ye recwvc; Py P [w/y] receive channel w : A on c 
send cw; Q Q send channel w : A on c 
c:1 = close c — send close on c 
waitc; Q Q receive close on c 


Table 1. Session types with operational description 


V; AFs P: (x: A) 


where Y stores the type variables a, A represents the linear antecedents x; : Ai, 
P is the process expression and x : A is the linear succedent. We propose and 
maintain that all free type variables in A, P, and A are contained in V. Finally, 
X is a fixed valid signature containing type and process definitions. Table 
overviews the session types, their associated process terms, their continuation 
(both in types and terms) and operational description. For each type, the first 
line describes the provider’s viewpoint, while the second line describes the client’s 
matching but dual viewpoint. 

We formalize the operational semantics as a system of multiset rewriting 
rules [9]. We introduce semantic objects proc(c, P) and msg(c, M) which mean 
that process P or message M provide along channel c. A process configuration 
is a multiset of such objects, where any two provided channels are distinct. 


6.1 Basic Session Types 


We briefly review the structural types already existing in the Rast language. The 
internal choice type constructor @{f: Ag}vex is an n-ary labeled generalization 
of the additive disjunction A ® B. Operationally, it requires the provider of 
x: {L : Agheer to send a label label k € L on channel x and continue to 
provide type Ag. The corresponding process term is written as (x.k ; P) where 
the continuation P provides type x : Ax. Dually, the client must branch based 
on the label received on x using the process term case x (l => Qe)eer where Qe 
is the continuation in the ¢-th branch. 


(KEL) V; AHP: (z: Aj) SR 
V; AF (wk; P): (@: @{€: Areen) © 


(VEEL) V; A,(a: Ae) Qe:: (2: C) - 
V; A, (x: @{2: Aghect) F case x (l => Qe)ecn : (z: 0) ~ 


194 A. Das et al. 


Communication is asynchronous, so that the client (c.k ; Q) sends a message 
k along c and continues as Q without waiting for it to be received. As a technical 
device to ensure that consecutive messages on a channel arrive in order, the 
sender also creates a fresh continuation channel c’ so that the message k is 
actually represented as (c.k ; c + c’) (read: send k along c and continue along 
c’). When the message k is received along c, we select branch k and also substitute 
the continuation channel c’ for c. 
(@S) : proc(c,c.k ; P) + proc(c’, P[c'/c]), msg(c, c.k ; cc) 
(C) : msg(c, c.k ; c 4+ c), proc(d, case c (l > Qe)eet) œ> proc(d, Q;[c’ /c]) 


The external choice constructor &{¢ : Ae jeer generalizes additive conjunc- 
tion and is the dual of internal choice reversing the role of the provider and 
client. The corresponding rules for statics and dynamics are skipped for brevity 
and presented in the technical report [14]. 

The tensor operator A ® B prescribes that the provider of x : A & B sends 
a channel, say w of type A and continues to provide type B. The corresponding 
process term is send x w ; P where P is the continuation. Correspondingly, its 
client must receive a channel on x using the term y + recv x ; Q, binding it to 
variable y and continuing to execute Q. 


V; AFP: (a: B) 
V; A,(y: A)F (send z y; P):: (£x: A8 B) 


&R 


V; A, (y: A), (a: B)FQ: (2:0) 
V; A, (x: AQ B)F (ye recv z ; Q): (z: C) 


&L 


Operationally, the provider (send c d ; P) sends the channel d and the continua- 
tion channel c’ along c as a message and continues with executing P. The client 
receives channel d and continuation channel c’ appropriately substituting them. 


(@S) : proc(c,send cd ; P) |> proc(c’, P{c’/c]), msg(c, send cd; coc’) 
(@C) : msg(c,sendcd; c & c'), proc(e, x + recvc; Q) + proc(e, Q[c’, d/c, x]) 


The dual operator A — B allows the provider to receive a channel of type A 
and continue to provide type B. The client of A — B, on the other hand, sends 
the channel of type A and continues to use B using dual process terms as ®. 

The type 1 indicates termination requiring that the provider of x : 1 send a 
close message, formally written as close x followed by terminating the commu- 
nication. Correspondingly, the client of x : 1 uses the term wait x ; Q to wait 
for x to terminate before continuing with executing Q. 

A forwarding process x + y identifies the channels x and y so that any 
further communication along either x or y will be along the unified channel. Its 
typing rule corresponds to the logical rule of identity. 


id 
V;y:AF(aey): (@: A) i 

Operationally, a process c + d forwards any message M that arrives on d to c 

and vice-versa. Since channels are used linearly, the forwarding process can then 

terminate, ensuring proper renaming, as exemplified in the rules below. 
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(id*C) : msg(d', M), proc(c,c + d) ++ msg(c, M[c/d]) 
(id~C) : proc(c,c + d), msg(e, M(c)) =œ msg(e, M(c)[d/c]) 


We write M(c) to indicate that c must occur in message M ensuring that M is 
the sole client of c. 


Process Definitions Process definitions have the form AF ffa] = P:: (x: A) 
where f is the name of the process and P its definition, with A being the 
channels used by f and x : A being the offered channel. In addition, @ is a 
sequence of type variables that A, P and A can refer to. These type variables 
are implicitly universally quantified at the outermost level and represent prenex 
polymorphism. All definitions are collected in the fixed global signature X. For a 
valid signature, we require that @; At P :: (a: A) for every definition, thereby 
allowing definitions to be mutually recursive. A new instance of a defined process 
f can be spawned with the expression x + f[A] y ; Q where y is a sequence 
of channels matching the antecedents A and A is a sequence of types matching 
the type variables w. The newly spawned process will use all variables in y and 
provide x to the continuation Q. 
y' : BP ffa] = P; :: (x: Bye X 
A = (y : B)[A/a] V; A, (x: BIA/a]) F Q: (z:C) 


= def 
V; A,A’E (2 & fA; Q) (2:0) 


The declaration of f is looked up in the signature X (first premise), and A 
is substituted for @ while matching the types in A’ and y (second premise). 
Similarly, the freshly created channel z has type A from the signature with A 
substituted for @. 

The complete set of rules for the type system and the operational semantics 
for our language are presented in [14]. 


6.2 Type Safety 


The extension of session types with nested polymorphism is proved type safe by 
the standard theorems of preservation and progress, also known as session fidelity 
and deadlock freedom. At runtime, a program is represented using a multiset of 
semantic objects denoting processes and messages defined as a configuration. 


S := -| S,S' | proc(c, P) | msg(c, M) 


We say that proc(c, P) (or msg(c, M)) provide channel c. We stipulate that no 
two distinct semantic objects in a configuration provide the same channel. 


Type Preservation The key to preservation is defining the rules to type a 
configuration. We define a well-typed configuration using the judgment A; Fs 
S :: Ag denoting that configuration S uses channels A, and provides channels 
A>. A configuration is always typed w.r.t. a valid signature X. Since the signature 
X is fixed, we elide it from the presentation. 


TT 
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ALF Si 1: A2 A2 FE S2 : A3 


— emp comp 
AE():A Aı F (S1, S2) : 43 
-; AL P:: (a: A) -; AFM: (a: A) 
proc msg 
AF proc(a, P) :: (a: A) AF msg(a, M) :: (a: A) 


Fig. 2. Typing rules for a configuration 


The rules for typing a configuration are defined in Figure |2} The emp rule 
states that an empty configuration does not consume any channels provides 
all channels it uses. The comp rule composes two configurations Sı and S2; Sı 
provides channels Az while S2 uses channels Ap. The rule proc creates a singleton 
configuration out of a process. Since configurations are runtime objects, they do 
not refer to any free variables and V is empty. The msg rule is analogous. 


Global Progress To state progress, we need to define a poised process [39]. A 
process proc(c, P) is poised if it is trying to receive a message on c. Dually, a 
message msg(c, M) is poised if it is sending along c. A configuration is poised if 
every message or process in the configuration is poised. Intuitively, this represents 
that the configuration is trying to communicate externally along one of the 
channels it uses or provides. 


Theorem 6 (Type Safety). For a well-typed configuration Ay Fy S:: Ao, 


(i) (Preservation) If S++ S', then A, Fy S :: Ag 
(ii) (Progress) Either S is poised, or S++ S’. 


Proof. Preservation is proved by case analysis on the rules of operational seman- 
tics. First, we invert the derivation of the current configuration S and use the 
premises to assemble a new derivation for S’. Progress is proved by induction on 
the right-to-left typing of S so that either S is empty (and therefore poised) or 
S = (D, proc(c, P)) or S = (D, msg(c, M)). By the induction hypothesis, either 
D ++ D’ or D is poised. In the former case, S takes a step (since D does). In 
the latter case, we analyze the cases for P and M, applying multiple steps of 
inversion to show that in each case either S can take a step or is poised. 


7 Relationship to Context-Free Session Types 


As ordinarily formulated, session types express communication protocols that 
can be described by regular languages [45]. In particular, the type structure is 
necessarily tail recursive. Context-free session types (CFSTs) were introduced by 
Thiemann and Vascoconcelos [45] as a way to express a class of communication 
protocols that are not limited to tail recursion. CFSTs express protocols that 
can be described by single-state, real-time DPDAs that use the empty stack 
acceptance criterion [134]. 
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Despite their name, the essence of CFSTs is not their connection to a par- 
ticular subset of the (deterministic) context-free languages. Rather, the essence 
of CFSTs is that session types are enriched to admit a notion of sequential com- 
position. Nested session types are strictly more expressive than CFSTs, in the 
sense that there exists a proper fragment of nested session types that is closed 
under a notion of sequential composition. (In keeping with process algebras like 
ACP [2], we define a sequential composition to be an operation that satisfies the 
laws of a right-distributive monoid.) 

Consider (up to a,,n-equivalence) the linear, tail functions from types to 
types with unary type constructors only: 


S,T == ñaaa | Aa. V[S a] | Aa. O{2: Spa}eer | a. &{l: Sea}oer 
| àa. AQ (Sa) | Aa. A — (Sa) 


The linear, tail nature of these functions allows the type a to be thought of as 
a continuation type for the session. The functions S are closed under function 
composition, and the identity function, da. a, is included in this class of func- 
tions. Moreover, because these functions are tail functions, composition right- 
distributes over the various logical connectives in the following sense: 


(\a.V[Sa]) oT = ña. V[(S oT) a] 
(a. @{2: Sea}ccr) oT = Aa. O{£ : (Sro T)ar 
(ia. AQ (Sa))oT = ña. AQ ((S0T) a) 


and similarly for & and —. Together with the monoid laws of function com- 
position, these distributive properties justify defining sequential composition as 
S;T=SoT. 

This suggests that although many details distinguish our work from CF- 
STs, nested session types cover the essence of sequential composition underlying 
context-free session types. However, even stating a theorem that every CFST 
process can be translated into a well-typed process in our system of nested ses- 
sion types is difficult because the two type systems differ in many details: we 
include ® and — as session types, but CFSTs do not; CFSTs use a complex 
kinding system to incorporate unrestricted session types and combine session 
types with ordinary function types; the CFST system uses classical typing for 
session types and a procedure of type normalization, whereas our types are in- 
tuitionistic and do not rely on normalization; and the CFST typing rules are 
based on natural deduction, rather than the sequent calculus. With all of these 
differences, a formal translation, theorem, and proof would not be very illumi- 
nating beyond the essence already described here. Empirically, we can also give 
analogues of the published examples for CFSTs (see, e.g., the first two examples 
of Section [9). 

Finally, nested session types are strictly more expressive than CFSTs. Recall 
from Section [2] the language L3 = {L"aR”a U L”bR”b | n > 0}, which can 
be expressed using nested session types with two type parameters used in an 
essential way. Moreover, Korenjak and Hopcroft [34] observe that this language 
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cannot be recognized by a single-state, real-time DPDA that uses empty stack 
acceptance, and thus, CFSTs cannot express the language L3. More broadly, 
nested types allow for finitely many states and acceptance by empty stack or 
final state, while CFSTs only allow a single state and empty stack acceptance. 


8 Implementation 


We have implemented a prototype for nested session types and integrated it 
with the open-source Rast system [I7]. Rast (Resource-aware session types) is 
a programming language which implements the intuitionistic version of session 
types [7] with support for arithmetic refinements [18], ergometric [I6] and tem- 
poral types for complexity analysis. Our prototype extension is implemented 
in Standard ML (8011 lines of code) containing a lexer and parser (1214 lines), a 
type checker (3001 lines) and an interpreter (201 lines) and is well-documented. 
The prototype is available in the Rast repository [13]. 


Syntax A program contains a series of mutually recursive type and process 
declarations and definitions, concretely written as 

type V[x1]...[xk] =A 

decl f[x1]...[xk] : (cl : A1) ... (cn : An) |l- (c: A) 

proc c <- f[x] ci ... cn =P 
Type V[z]| is represented in concrete syntax as V[x1]... [xk]. The first line 
is a type definition, where V is the type name parameterized by type vari- 
ables z1,...,£&ķ and A is its definition. The second line is a process declara- 
tion, where f is the process name (parameterized by type variables z1,..., 2x), 
(cy : A1)... (Cn : An) are the used channels and corresponding types, while the 
offered channel is c of type A. Finally, the last line is a process definition for the 
same process f defined using the process expression P. We use a hand-written 
lexer and shift-reduce parser to read an input file and generate the corresponding 
abstract syntax tree of the program. The reason to use a hand-written parser 
instead of a parser generator is to anticipate the most common syntax errors 
that programmers make and respond with the best possible error messages. 

Once the program is parsed and its abstract syntax tree is extracted, we 

perform a validity check on it. This includes checking that type definitions, and 
process declarations and definitions are closed w.r.t. the type variables in scope. 
To simplify and improve the efficiency of the type equality algorithm, we also 
assign internal names to type subexpressions parameterized over their free index 
variables. These internal names are not visible to the programmer. 


Type Checking and Error Messages The implementation is carefully de- 
signed to produce precise error messages. To that end, we store the extent (source 
location) information with the abstract syntax tree, and use it to highlight the 
source of the error. We also follow a bi-directional type checking [40] algorithm 
reconstructing intermediate types starting with the initial types provided in the 
declaration. This helps us precisely identify the source of the error. Another 
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particularly helpful technique has been type compression. Whenever the type 
checker expands a type V[A] defined as V[a] = B to B[A/a] we record a re- 
verse mapping from B[A/a] to V [a]. When printing types for error messages this 
mapping is consulted, and complex types may be compressed to much simpler 
forms, greatly aiding readability of error messages. 


9 More Examples 


Expression Server We adapt the example of an arithmetic expression from 
prior work on context-free session types [45]. The type of the server is defined as 


type bin = +{ bO : bin, bi : bin, $: 1 } 
type tm[K] = +{ const : bin * K, 

add : tm[tm[K]], 

double : tm[K] } 


The type bin represents a constant binary natural number. A process providing 
a binary number sends a stream of bits, bO and bi, starting with the least 
significant bit and eventually terminated by $. 

An arithmetic term, parameterized by continuation type K can have one of 
three forms: a constant, the sum of two terms, or the double of a term. Conse- 
quently, the type tm[K] ensures that a process providing tm[K] is a well-formed 
term: it either sends the const label followed by sending a constant binary 
number of type bin and continues with type K; or it sends the add label and 
continues with tm[tm[K]], where the two terms denote the two summands; or it 
sends the double label and continues with tm[K]. In particular, the continuation 
type tm[tm[K]] in the add branch enforces that the process must send exactly 
two summands for sums. 

As a first illustration, consider two binary constants a and b, and suppose 
that we want to create the expression a + 2b. We can issue commands to the 
expression server in a prefix notation to obtain a+ 2b, as shown in the following 
exp[K] process, which is parameterized by a continuation type K. 


decl exp[K] : (a : bin) (b : bin) (k : K) |- (e : tm[K]) 
proc e <- exp[K] ab k= 


e.add ; e.const ; send e a ; % (b:bin) (k:K) |- (e : tm[K]) 
e.double ; e.const ; send e b ; % (k:K) |- (e : K) 
e <-> k 


In prefix notation, a+ 2b would be written + (a) (2 b), which is exactly the form 
followed by the exp process: The process sends add, followed by const and the 
number a, followed by double, const, and b. Finally, the process continues at 
type K by forwarding k to e (intermediate typing contexts on the right). 

To evaluate a term, we can define an eval process, parameterized by type K: 


decl eval[K] : (t : tm[K]) l- (v : bin * K) 
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The eval process uses channel t : tm[K] as argument, and offersv : bin * K. 
The process evaluates term t and sends its binary value along v. The technical 
report contains the full implementation [14]. 


Serializing binary trees Another example from [45] is serializing binary trees. 
Here we adapt that example to our system. Binary trees can be described by: 


type Tree[a] = +{ node : Tree[a] * a * Tree[a] , leaf : 1 } 


These trees are polymorphic in the type of data stored at each internal node. A 
tree is either an internal node or a leaf, with the internal nodes storing channels 
that emit the left subtree, data, and right subtree. Owing to the multiple channels 
stored at each node, these trees do not exist in a serial form. 

We can, however, use a different type to represent serialized trees: 


type STree[a] [K] = +{ nd: STree[a][a * STree[K]] , 1f : K} 


A serialized tree is a stream of node and leaf labels, nd and 1f, parameterized by 
a continuation type K. Like add in the expression server, the label nd continues 
with type STree[a] [a * STree[K]]: the label nd is followed by the serialized 
left subtree, which itself continues by sending the data stored at the internal 
node and then the serialized right subtree, which continues with type KP 
Using these types, it is relatively straightforward to implement processes that 
serialize and deserialize such trees. The process serialize can be declared with: 


decl serialize[a] [K] : (t : Tree[a]l) (k : K) |- (s : STree[a] [K]) 


This process uses channels t and k that hold the tree and continuation, and 
offers that tree’s serialization along channel s. If the tree is only a leaf, then the 
process forwards to the continuation. Otherwise, if the tree begins with a node, 
then the serialization begins with nd. A recursive call to serialize serves to 
serialize the right subtree with the given continuation. A subsequent recursive 
call serializes the left subtree with the data together with the right subtree’s 
serialization as the new continuation. 

It is also possible to implement a process for deserializing trees, but because 
of space limitations, we will not describe deserialize here. 


decl deserialize[a] [K] : (s : STree[a] [K]) |- (tk : Tree[a] * K) 


Generalized tries for binary trees Using nested types in Haskell, prior 
work [28] describes an implementation of generalized tries that represent map- 
pings on binary trees. Our type system is expressive enough to represent such 
generalized tries. We can reuse the type Tree[a] of binary trees given above. 
The type Trie[a] [b] describes tries that represent mappings from Tree[a] to 
type b: 


3 The presence of a * means that this is not a true serialization because it sends a 
separate channel along which the data of type a is emitted. But there is no uniform 
mechanism for serializing polymorphic data, so this is as close to a true serialization 
as possible. Concrete instances of type Tree with, say, data of base type int could 
be given a true serialization by “inlining” the data of type int in the serialization. 
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type Trie[a] [b] = &{ lookup_leaf : b, 
lookup_node : Trie[a][a -o Trie[a] [b]] } 


A process for looking up a tree in such tries can be declared by: 
decl lookup_tree[a][b] : (m : Trie[a][b]) (t : Tree[a]) |- (v : b) 


To lookup a tree in a trie, first determine whether that tree is a leaf or a node. 
If the tree is a leaf, then sending lookup_leaf to the trie will return the value 
of type b associated with that tree in the trie. 

Otherwise, if the tree is a node, then sending lookup_node to the trie results 
in a trie of type Trie[a] [a -o Trie[a] [b]] that represents a mapping from 
left subtrees to type a -o Trie[a] [b]. We then lookup the left subtree in this 
trie, resulting in a process of type a -o Triel[a] [b] to which we send the data 
stored at our original tree’s root. That results in a trie of type Trie [a] [b] that 
represents a mapping from right subtrees to type b. Therefore, we finally lookup 
the right subtree in this new trie and obtain a result of type b, as desired. 

We can define a process that constructs a trie from a function on trees: 


decl build_trie[a] [b] : (f : Tree[a] -o b) |- (m : Trie[a] [b]) 


Both lookup_tree and build_trie can be seen as analogues to deserialize 
and serialize, respectively, converting a lower-level representation to a higher- 
level representation and vice versa. These types and declarations mean that tries 
represent total mappings; partial mappings are also possible, at the expense of 
some additional complexity. 

All our examples have been implemented and type checked in the open- 
source Rast repository [13]. We have also further implemented the standard 
polymorphic data structures such as lists, stacks and queues. 


10 Further Related Work 


To our knowledge, our work is the first proposal of polymorphic recursion using 
nested type definitions in session types. Thiemann and Vasconcelos [45] use 
polymorphic recursion to update the channel between successive recursive calls 
but do not allow type constructors or nested types. An algorithm to check type 
equivalence for the non-polymorphic fragment of context-free session types has 
been proposed by Almeida et al. [I]. 

Other forms of polymorphic session types have also been considered in the 
literature. Gay studies bounded polymorphism associated with branch and 
choice types in the presence of subtyping. He mentions recursive types (which 
are used in some examples) as future work, but does not mention parametric 
type definitions or nested types. Bono and Padovani [4J5] propose (bounded) 
polymorphism to type the endpoints in copyless message-passing programs in- 
spired by session types, but they do not have nested types. Following Kobayashi’s 
approach [33], Dardha et al. provide an encoding of session types relying 
on linear and variant types and present an extension to enable parametric and 


202 A. Das et al. 


bounded polymorphism (to which recursive types were added separately [II]) 
but not parametric type definitions nor nested types. Caires et al. [6] and Perez 
et al. [38] provide behavioral polymorphism and a relational parametricity prin- 
ciple for session types, but without recursive types or type constructors. 

Nested session types bear important similarities with first-order cyclic terms, 
as observed by Jančar. Jančar [80] proves that the trace equivalence problem of 
first-order grammars is decidable, following the original ideas by Stirling for the 
language equality problem in deterministic pushdown automata [43]. These ideas 
were also reformulated by Sénizergues [41]. Henry and Sénizergues proposed 
the only practical algorithm to decide the language equivalence problem on de- 
terministic pushdown automata that we are aware of. Preliminary experiments 
show that such a generic implementation, even if complete in theory, is a poor 
match for the demands made by our type checker. 


11 Conclusion 


Nested session types extend binary session types with parameterized type def- 
initions. This extension enables us to express polymorphic data structures just 
as naturally as in functional languages. The proposed types are able to capture 
sequences of communication actions described by deterministic context-free lan- 
guages recognized by deterministic pushdown automata with several states, that 
accept by empty stack or by final state. In this setting, we show that type equal- 
ity is decidable. To offset the complexity of type equality, we give a practical 
type equality algorithm that is sound, efficient, but incomplete. 

In the future, we are planning to explore subtyping for nested types. In 
particular, since the language inclusion problem for simple languages [22] is 
undecidable, we believe subtyping can be reduced to inclusion and would also 
be undecidable. Despite this negative result, it would be interesting to design 
an algorithm to approximate subtyping. That would significantly increase the 
programs that can be type checked in the system. In another direction, since 
Rast [I7] supports arithmetic refinements for lightweight verification, it would 
be interesting to explore how refinements interact with polymorphic type pa- 
rameters, namely in the presence of subtyping. We would also like to explore 
examples where the current type equality is not adequate. Finally, protocols in 
distributed algorithms such as consensus or leader election (Raft, Paxos, etc.) 
depend on unbounded memory and cannot usually be expressed with finite con- 
trol structure. In future work, we would like to see if these protocols can be 
expressed with nested session types. 
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Abstract. Differential privacy is a de facto standard in data privacy 
with applications in the private and public sectors. Most of the tech- 
niques that achieve differential privacy are based on a judicious use of 
randomness. However, reasoning about randomized programs is difficult 
and error prone. For this reason, several techniques have been recently 
proposed to support designer in proving programs differentially private 
or in finding violations to it. 

In this work we propose a technique based on symbolic execution for 
reasoning about differential privacy. Symbolic execution is a classic tech- 
nique used for testing, counterexample generation and to prove absence of 
bugs. Here we use symbolic execution to support these tasks specifically 
for differential privacy. To achieve this goal, we design a relational sym- 
bolic execution technique which supports reasoning about probabilistic 
coupling, a formal notion that has been shown useful to structure proofs 
of differential privacy. We show how our technique can be used to both 
verify and find violations to differential privacy. 


1 Introduction 


Differential Privacy has become a de facto gold standard definition of pri- 
vacy for statistical analysis. This success is mostly due to the generality of the 
definition, its robustness and compositionality. However, getting differential pri- 
vacy right in practice is a hard task. Even privacy experts have released fragile 
code subject to attacks and published incorrect algorithms (16). This 
challenge has motivated the development of techniques to support programmers 
to show their algorithms differentially private. Among the techniques that have 
been proposed there are type systems '18]20|[24]/26), methods based on model 
checking and program analysis 2223|, and program logics Baer. Several 
works have also focused on developing techniques to find violations to differen- 
tial privacy BIBER [27]. Most of these works focus only on either verifying 
a program differentially private or finding violations. Exceptions are the recent 
works by Barthe et al. and Wang et al. (developed concurrently to our 
work) which propose method that can instead address both. 

Motivated by this picture, we propose a new technique named Coupled Rela- 
tional Symbolic Execution (CRSE), which supports proving and finding violation 
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to differential privacy. Our technique is based on two essential ingredients: rela- 
tional symbolic execution and approximate probabilistic couplings [8]. 


Relational Symbolic Execution. Symbolic execution is a classic technique 
used for bug finding, testing and proving. In symbolic execution an evaluator ex- 
ecutes the program which consumes symbolic inputs instead of concrete ones. 
The evaluator follows, potentially, all the execution paths the program could 
take and collects constraints over the symbolic values, corresponding to these 
paths. Similarly, in relational symbolic execution (RSE) one is concerned 
with bug finding, testing, or proving for relational properties. These are prop- 
erties about two executions of two potentially different programs. RSE executes 
two potentially different programs in a symbolic fashion and exploits relational 
assumptions about the inputs or the programs in order to reduce the number 
of states to analyze. This is effective when the codes of the two programs share 
some similarities, and when the property under consideration is relational in 
nature, as in the case of differential privacy. 


Approximate Probabilistic Couplings. Probabilistic coupling is a proof 
technique useful to lift a relation over the support of a joint distribution to a 
relation over the two probability marginals of the joint. This allows one to reason 
about relations between probability distributions by reasoning about relations 
on their support, which can be usually done in a symbolic way. In this approach 
the actual probabilistic reasoning is confined to the soundness of the verification 
system, rather than being spread everywhere in the verification effort. A relax- 
ation of the notion of coupling, called approximate probabilistic coupling BA, 
has been designed to reason about differential privacy. This can be seen as a 
regular probabilistic coupling with some additional parameters describing how 
close the two probability distribution are. 


In this work, we combine these two approaches in a framework called Coupled 
Relational Symbolic Execution. In this framework, a program is executed in a 
relational and symbolic way. When some probabilistic primitive is executed, 
CRSE introduces constraints corresponding to the existence of an approximate 
probabilistic coupling on the output. These constraints are combined with the 
constraints on the execution traces generated by symbolically and relationally 
executing other non-probabilistic commands. These combined constraints can 
be exploited to reduce the number of states to analyze. When the execution is 
concluded CRSE checks whether there is a coupling between the two outputs, 
or whether there is some violation to the coupling. We show the soundness of 
this approach for both proving and refuting differential privacy. However, for 
finding violations, one cannot reason only symbolically, and since checking a 
coupling directly can be computationally expensive, we devise several heuristics 
which can be used to facilitate this task. Using these techniques, CRSE allows 
one to verify differential privacy for an interesting class of programs, including 
programs working on countable input and output domains, and to find violations 
to programs that are not differentially private. 


CRSE is not a replacement for other techniques that have been proposed for 
the same task, it should be seen as an additional method to put in the set of 
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tools of the privacy developer which provides a high level of generality. Indeed, 

by being a totally symbolic technique, it can leverage a plethora of current 

technologies such as SMT solvers, algebraic solvers, and numeric solvers. 
Summarizing, the contribution of our work are: 


— We combine relational symbolic execution and approximate probabilistic 
coupling in a new technique: Coupled Relational Symbolic Execution. 

— We show CRSE sound for proving programs differentially private 

— We devise a set of heuristic - one of them sound, and the others useful - that 
can help a programmer in finding violations to differential privacy. 

— We show how CRSE can help in proving and refuting differential privacy for 
an interesting class of programs 


Most of the proofs are omitted here, more details can be found in PE]. 


2 CRSE Informally 


We will introduce CRSE through three examples of programs showing potential 
errors in implementations of differentially private algorithms. Informally, a ran- 
domized function A over a set of databases D is e-differential privacy (e-DP) if it 
maps two databases Dı and Də that differ for the data of one single individual 
(denoted Dı ~ D2) to output distributions that are indistinguishable up to some 
value e€ - usually referred to as the privacy budget. This is formalized by requir- 
ing that for every Dı ~ Də and for every u: Pr[A(Dj) = u] < ef Pr[A(D2) = ul. 
The smaller the e, the more privacy is guaranteed. 


Randomized response with wrong noise. 
A standard primitive to achieve differ- 
ential privacy when the data is a sin- 
gle boolean is randomized response (25). 
We will use this (simplified) primitive 
to give an idea of how CRSE works. 
This primitive can be actually reduced 
1: o 4+ RR.(z) to the primitives that CRSE uses and 
2: return o so it won’t be included in later sec- 
tions. The primitive RR,(b) takes in in- 
Fig. 1: Algorithm 1 is not «DP. put p € (4,1) and a boolean b and it 
outputs b with probability p, and b with 

probability 1 — p. By unfolding the definition of differential privacy it is easy to 
see that this primitive is log|—p/(p—1)]-DP. This is internalized in CRSE thanks 
to the the existence of an log|—p/(p—1)]-approximate lifting (Definition B} of the 
equality relation = between the distributions RR,(b) and RR,(b). When CRSE 
executes line 1, it assumes that 0; = o2 and it sets a counter €e, representing 
the privacy budget required by the primitive, to log|—- +4]. In order to check 
whether this program is actually e-DP it will then try to check whether this set 
of conditions implies the postcondition V = 0; = 02 ^ €e < €. This implication 


Algorithm 1 
A bad use of Randomized Response 


Input: € E€ R*, 21, 22 € {true, false} 
Precondtion: x; Æ x2 
Postcondition: 0; = 02 A €c < € 
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will fail. Indeed, there are value of e€, say € = 0.7, which give a value of ee which 
is actually greater than e. This shows that the user may have confused the e 
parameter with the parameter p that the randomized response a takes 


in input. If the user substituted line 1 with the following perfs oÈ RR, (x), 
then CRSE ore haye. considered the following conditions instead: 0; = 02 and 
= log|- 7%] \p = q$. These conditions would then imply the postcondition 

v proving the correcteness of the program. 
The intuition behind this proof is that everytime CRSE executes a random 


assignment of the form oÈ RR p(x), it is allowed to assume that 01 = 0 as long as 
it spends a certain amount of privacy budget, i.e. log- 35]. These assumptions 
are recorded in a set of constraints which is then used to see if it implies the 
condition that two output variables are equal and the budget spent does not 
exceed e. As a consequence of the definition of approximate lifting, this implies 
differential privacy (Lemma [). If this fails, CRSE will provide a counterexample 
in the form of values for the inputs 71, 72,¢€,p. Such counterexamples to the 
postcondition do not necessarily denote a counterxampled to the privacy of the 
algorithm (as we will see later the logic of couplings which CRSE is based on is 
not complete w.r.t the differential privacy notion) but only potential candidates, 
and hence need to be further checked. 


Algorithm 2 A buggy Above Algorithm 3 Another buggy Above 


Threshold Threshold 
Input:t,¢ € R, D € D, qfi] : DON Input: 
Output: o : |LŻ, z, L”7°7}] t,e E€ R, DED, qli]: DON 
Precondition: Output: o € {1,T}” 

Dı ~ D2 > |qli] (Dı) — qli] (D2)| < 1 Precondition: 
Postcondition: 01 = 02 A €c < € Dı ~ Dz > |qli] (D1) — qli] (D2)| < 1 


Postcondition: 0, = 02 A €c < € 


lof L"sre n+l 
2: laps (t) 1: tA laps (t (t) 
3: for (i in 1:n) do 2: for (i in 1: n) do 
A: 84 lap ¢ (qlil(D)) if qli](D) 2 t then 
5: if §>¢Ar=n+1 then : off] + 
i y A : 5: else 
6: oft] — ŝ;r +} i ; 
6: oļi] — L 
7: return o 
T: return o 


Two buggy Sparse Vector implementations. The next two examples are variations 
of the algorithm above threshold, a component of the sparse vector technique, a 
classical technique which is still subject of studies for improvement [714]. Given 
a numeric threshold, an array of numeric queries of length n, and a dataset, this 
algorithm returns the index of the first query whose result exceeds the threshold 
- and potentially it should also return the value of that query. This should be 
done in a way that preserves differential privacy. To do this in the right way, 
a program should add noise to the threshold, even if it is not sensitive data, 
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add noise to each query, compare the values, and return the index of the first 
query for which this comparison succeed. The noise that is usually added is 
sampled from the Laplace distribution, one of the main primitive in differential 
privacy. The analysis of this algorithm is rather complex: it uses the noise on 
the threshold as a way to pay only once for all the queries that are below the 
threshold, and the noise on the queries to pay for the first and only query that 
is above the threshold, if any. Due to this complex analysis [16], this algorithm 
has been a benchmark for tools for reasoning about differential privacy pIB]26]. 

Algorithm [2]has a bug making the (whole) program not differentially private, 
for values of n greater than 4. The program initializes an array of outputs o to all 
bottom values, and a variable r to n+ 1 which will be used as guard in the main 
loop. It then adds noise to the threshold, and iterates over all the queries adding 
noise to their results. If one of the noised-results is above the noisy threshold 
it saves the value in the array of outputs and updates the value of the guard 
variable, causing it to exit the main loop. Otherwise it keeps iterating. The bug 
is returning the value of the noisy query that is above the threshold and not only 
its index, as done by the instruction in red in line 6 - this is indeed not enough 
for guaranteeing differential privacy. For n < 5 this program can be shown 
e-differentially private by using the composition property of differential privacy 
that says that the k-fold composition of e-DP programs is ke-differentially private 
(Section [3). However, for n > 5 the more sophisticated analysis we described 
above fails. The proof principle CRSE will use to try to show this program <€- 
differentially private is to prove the assertion 0j =: = > 09 =tAe, < €, for 
every L < n - the soundness of this principle has been proved in [8]. That is, 
CRSE will try to prove the following assertions (which would prove the program 
e-differentially private): 


e 0 = [81,1,...,L]) = op = [51, L,..., L] A €c < € 
eo, = [1,81,...,1] => o2 = |L, ŝ1,..., 1] Nee < € 
eo, = |L,..., 81] = 02 = [1,..., 51] Nee < € 


While proving the first assertion, CRSE will first couple at line 3 the threshold 
as Îi + ko = ta; for ko > 1 where 1 is the sensitivity of the queries, which is 
needed to guarantee that all the query results below the threshold in one run 
stay below the threshold in the other run, then, it will increase appropriately 
the privacy budget by ko5. As a second step it will couple ê, + kı = 82 in line 4. 
Now, the only way for the assertion 01 = [81, L, L] => o2 = [81, L, L] to hold, 
is guaranteeing that both s} = ŝ2 and §; > tı = > s> tə hold. But these two 
assertions are not consistent with each other because kg > 1. That is, the only 
way, using these coupling rules, to guarantee that the run on the right follows 
the same branches of the run on the left (this being necessary for proving the 
postcondition) is to couple the samples 8; and 8 so that they are different, this 
necessarily implying the negation of the postcondition. This would not be the 
case if we were returning only the index of the query, since we can have that 
both the queries are above the threshold but return different values. Indeed, 


by substituting line 7 with off] ÈT the program can be proven e-differentially 
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private. So the refuting principle CRSE will use here is the one that finds a trace 
on the left run such that the only way the right run can be forced to follow it is 
by making the output variables different. 

A second example with bug of the above threshold algorithm is shown in 
Figure [8] In this example, in the body of the loop, the test is performed between 
the noisy threshold and the actual value of the query on the database - that is, 
we don’t add noise to the query. CRSE will use for this example another refuting 
principle based on reachability. In particular, it will vacuously couple the two 
thresholds at line 1. That is it will not introduce any relation between ĉ, and to. 
CRSE will then search for a trace which is satisfiable in the first run but not in 
the second one. This translates in an output event which has positive probability 
on the first run but 0 probability in the second one leading to an unbounded 
privacy loss, and making the algorithm not e-differentially private for all finite e. 
Interestingly this unbounded privacy loss can be achieved with just 2 iterations. 


3 Preliminaries 


Let A be a denumerable set, a subdistribution over A is a function u : A > [0,1] 
with weight -c4 (a) less or equal than 1. We denote the set of subdistribu- 
tions over A as sdistr(A). When a subdistribution has weight equal to 1, then we 
call it a distribution. We denote the set of distributions over A by distr(A). The 
null subdistribution uo : A > [0,1] assigns to every element of A mass 0. The 
Dirac’s distribution unit(a) : A — [0,1], defined for a € A as unit(a)(x) = 1 
if x =a, and unit(a)(x) = 0, otherwise. The set of subprobability distributions 
can be given the structure of a monad, with unit the function unit. We have 
also a function bind = Aw.Af.Aa. 5 u(b)- f(b)(a) allowing us to compose sub- 
beO! 
distributions (as we compose monads). We will use the notion of ¢-divergence 
A.({41, H2) between two subdistributions 41, u2 € sdistr(A) to define approxi- 
mate coupling, this is defined as:A. (41, H2) = SUP gco (41 (EZ) — exp(e) - u2(E)). 
Formally, differential privacy is a property of a probabilistic program: 


Definition 1 (Differential Privacy (8}). Lete > 0 and~C DxD. A program 
A: D —> distr(O) is e-differentially private with respect to ~ iif YD ~ D'Yu € 
O: 

Pr[A(D) = u] < ef Pr[/A(D') = u] 


The adjacency relation ~ over the set of databases D models which pairs of 
input databases should be indistinguishable to an adversary. In its most classical 
definition, ~ relates databases that differ in one record in terms of hamming 
distance. Differentially private programs can be composed [8]: given programs A, 
and Ag, respectively €ı and €2 differentially private, their sequential composition 
A(D) = Aə((Aı (D), D)) is e1 + €2-differentially private. We say that a function 
f:D > Zis k sensitive if |f(x) — f(y)| < k, for all x ~ y. Functions with 
bounded sensitivity can be made differentially private by adding Laplace noise: 
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Lemma 1 (Laplace Mechanism [8]). Let € > 0, and assume that f : Dt 
Z is a k sensitive function with respect to ~C D x D. Then the randomized 
algorithm mapping d to f(D) +v, where v is sampled from a discrete version of 
the Laplace distribution with scale 1, is ke-differentially private w.r.t to ~. 


The notion of approximate probabilistic coupling is internalized by the notion 
of approximate lifting [3]. 


Definition 2. Given mı € distr(A), y € distr(B), a relation YC Ax B, and 
c E€ R, we say that u1, u2 are related by the € approximate lifting of Y, denoted 
pi(W)* u2, iff there exists wr, ur € distr(A x B) such that: 1) Aa. >, uL (a,b) = 
pı and Xb. 7, ur(a,b) = p2, 2) {(a,b)|ur(a,b) > OV uR(a,b) > 0} CY, 3) 
A. (HL, UR) <0. 

Approximate lifting satisfies the following fundamental property (3): 

Lemma 2. Let pı, U2 € distr(A),e > 0. Then Ac(t1, H2) < 0 iff (=) uo. 


From Lemma [2] we have that an algorithm A is e-differentially private w.r.t to 
~ iff A(D,)(=)*A(D2) for all Dı ~ D2. The next lemma [8], finally, casts the 
Laplace mechanisms in terms of couplings: 


Lemma 3. Let Ly, b, Los b two Laplace random variables with mean v1, and v2 
respectively, and scale b. Then 


Lob {(21, 22) | Zy+ k= Z2 E Zx Z\\k+01—vale Lys .b 


for all k € Z,e> 0. 


4 Concrete languages 


In this section we sketch the two CRSE concrete languages, the unary one PFOR 
and the relational one RPFOR. These will be the basis on which we will design 
our symbolic languages in the next section. 


4.1 PFOR 


PFOR is a basic FOR-like language with arrays, to represent databases and other 
data structures, and probabilistic sampling from the Laplace distribution. The 
full syntax is pretty standard and we fully present it in the extended version (12). 
In the following we have a simplified syntax: 


C > c::= skip | cc | arte | x lape(e) | if e then c else c |... 


The set of commands C includes assignments, the skip command, sequencing, 
branching, and (not showed) array assignments and looping construct. Finally, 
we also include a primitive instruction xÈlape, (e1) to model random sampling 


from the Laplace distribution. Arithmetic expressions e € € are built out of inte- 
gers, array accesses and lengths, and elements in Xp. The set 4, contains values 
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denoting random expressions, that is values coming from a random assignment 
or arithmetic expressions involving such values. We will use capital letters such 
as X,Y,... to range over Xp. The set of values is V = ZU 4p. In Figure [2] we 
introduce a grammar of constraints for random expressions, where X ranges over 
Xp and n, nı, n2 E€ Z. The simple constraints in the syntactic categories ra and 
re record that a random value is either associated with a specific distribution, or 
that the computation is conditioned on some random expression being greater 
than 0 or less than or equal than 0. The former constraints, as we will see, come 
from branching instructions. We treat constraint lists p, p’, in Figure[2]as lists of 
simple constraints and hence, from now on, we will use the infix operators :: and 
@, respectively, for appending a simple constraint to a constraint and for con- 
catenating two constraints. The symbol || denotes the empty list of probabilistic 
constraints. Environments in the set M, or probabilistic memories, map pro- 
gram variables to values in V, and array names to elements in Array = Uj; V’, 
so the type of a memory m € M is Y > V UA —> Array. We will distinguish 
between probabilistic concrete memories in M and concrete memories in the set 
M. =V > ZUA > U; Zt. Probabilistic concrete memories are meant to denote 
subdistributions over the set of concrete memories Me. 

Expressions in PFOR are given 
meaning through a big-step evalua- 
tion semantics specified by a judg- 


ra ::= XËlapn, (nı) 


re::=n|X|re9re ment of the form: (m,e, p) Je (v,p’), 
P>p::=X=re|re>0]| where m € M,e € E, p,p € Pv E V. 
re<0|raļlp:P]] The judgments reads as: expression 


e reduces to the value v and proba- 
bilistic constraints p’ in an environ- 
ment m with probabilistic concrete 
constraints p. We omit the rules for this judgment here, but we will present sim- 
ilar rules for the symbolic languages in the next section. Commands are given 


Fig. 2: Probabilistic constraints 


(m,e, p) Le w, p’) vEZ v<o 


if-false ; 
(m,if e then cı else c2,p) >c (m,c2,p) 


i 


(m,e, p) te (v, p") VE Xp p” = p'@v > 0 


if-true-prob 
E (m, if e then cı else c2, p) >c (m, c1, p”) 


(m, e1, p) Le (nı, pı) (m, e2, p1) te (na, p2) n2 > 0 


X fresh(X,) p =p1@X = lapn,(n1) 
lap-ass 


(m, «Lape; (e1), p) >o (m[z + X], skip, p’) 


Fig. 3: PFOR selected rules 


meaning through a small-step evaluation semantics specified by a judgment of 
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the form: (m,c,p) >e (m, cd, p), where m,m E€ M,c,c’ € C,p,p’ € P. The 
judgment reads as: the probabilistic concrete configuration (m,c,p) steps in to 
the probabilistic concrete configuration (m’,c’,p’). Figure shows a selection 
of the rules defining this judgment. Most of the rules are self-explanatory so we 
only describe the ones which are non standard. Rule lap-ass handles the ran- 
dom assignment. It evaluates the mean e, and the scale e2 of the distribution 
and checks that e2 actually denotes a positive number. The semantic predicate 
fresh asserts that the first argument is drawn nondeterministically from the 
second argument and that it was never used before in the computation. Notice 
that if one of these two expressions reduces to a probabilistic symbolic value the 
computation halts. Rule if-true-prob (and if-false-prob) reduces the guard of 
a branching instruction to a value. If the value is a probabilistic symbolic con- 
straint then it will nondeterministically choose one of the two branches recording 
the choice made in the list of probabilistic constraints. If instead the value of the 
guard is a numerical constant it will choose the right branch deterministically 
using the rules if-false and if-true (not showed). 

We call a probabilistic concrete configuration of the form (m, skip, p) final. 
A set of concrete configurations ® is called final and we denote it by Final() 
if all its concrete configurations are final. We will use this predicate even for sets 
of sets of concrete configurations with the obvious lifted meaning. As clear from 
the rules a run of a PFOR program can generate many different final concrete 
configurations. A different judgment of the form D =>, D’, where D,D’ € P(Mx 
C x P), and in particular its transitive and reflexive closure ( =), will help us 
in collecting all the possible final configurations stemming from a computation. 
We have only one rule that defines this judgment: 


Sub-distr-step 
(m,c,p)ED B= (B\ {(m,c,p)}) U{(m', cd, p) | (m, c, p) >: (m,e, pY} 
D >. D' 


Rule Sub-distr-step nondeterministically selects a configuration s = (m, c, p) 
from %, removes s from it, and adds to %’ all the configurations s’ that are 
reachable from s. 

In section [3] we defined the notions of lifting, coupling and differential privacy 
using subdistributions in the form of functions from a set of atomic events to 
the interval [0,1]. The semantics of the languages proposed so far though only 
deal with subdistributions represented as set of concrete probabilistic config- 
urations. We now show how to map the latter to the former. In Figure [4] we 
define a translation function ([-;-]™) and, auxiliary functions as well, between 
a single probabilistic concrete configuration and a subdistribution defined using 
the unit(-)/bind(-,-) constructs. We make use of the constant subdistribution 
Lo Which maps every element to mass 0, and is usually referred to as the null 
subdistribution, also by lap,,(n1)(z) we denote the mass of (discrete version of) 
the Laplace distribution centered in nı with scale nz at the point z. 

The idea of the translation is that we can transform a probabilistic concrete 
memory M, E€ M into a distribution over fully concrete memories in Me by 
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ms;p\™P = bind ([p]?, (Aso-unit(so(ms)))) 

1]? = unit((]) 

X = re :: p']P= bind([p’]”, \s0-bind([re] £8, Azo.unit(X = zo :: so))) 

re > 0 :: p']P = bind([p’]”, \so-bind([re] ££, Azo.if (zo > 0) then unit(zo) else 10)) 
re < 0:: p']P = bind([p’]?, Aso.bind([[re] 22, Azo.if (zo < 0) then unit(zo) else uo)) 


lapny(n1))s° = Az.lapny(n1)(z) 
nyse = unit(n) 
XJ; = unit(s(X)) 


rei ®re2];° = bind([rei] f°, Avı.bind(|re2];°, Av2.unit(vı ® v2))) 


Fig. 4: Translation from configurations to subdistributions. 


sampling from the distributions of the probabilistic variables defined in m, in the 
order they were declared which is specified by the probabilistic path constraints. 
To do this we first build a substitution for the probabilistic variable which maps 
them into integers and then we perform the substitution on ms. Given a set of 
probabilistic concrete memories we can then turn them in a subdistribution by 
summing up all the translations of the single probabilistic configurations. Indeed, 
given two subdistributions 41, 42 defined over the same set we can always define 
the subdistribution pı + u2 by the mapping (uı + u2)(a) = pı (a) + u2(a). 

The following Lemma states an equivalence between these two representa- 
tions of probability subdistributions. The hypothesis of the theorem involve a 
well-formedness judgment, m F p, which has not been specified for lack of space 
but can be found in the extended version 01, it deals with well-formedness 
of the probabilistic path constraint p with respect to the concrete probabilistic 
memory m. 

Lemma 4. If m+ p and {(m,c,p)} = {(m1, skip, p1),..., (Mn, skip, pn) } 


n 


then bind([m; p]”, [c]c) = D mipi" 


This lemma justifies the following definition for the semantics of a program. 


Definition 3. The semantics of a program c executed on memory m and prob- 
ability path constraint po is |c]e(mo, po) = > [m; p], 
(m,skip,p)€D 
when {(m,c,p)} >} D, Final(D), and mo F po. If po = [| we write [c]c(mo). 


4.2 RPFOR 


In order to be able to reason about differential privacy we will build on top of 
PFOR a relational language called RPFOR with a relational semantics dealing 
with pair of traces. Intuitively, an execution of a single RPFOR program repre- 
sents the execution of two PFOR programs. Inspired by the approach of (19), we 
extend the grammar of PFOR with a pair constructor (-|-) which can be used 
at the level of values (v,|v2), expressions (e;|e2), or commands (ci|c2), where 
Ci, €i, V; for i € {1,2} are commands, expressions, and values in PFOR. This 
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entails that pairs cannot be nested. This syntactic invariant is preserved by the 
rules handling the branching instruction. Pair constructs are used to indicate 
where commands, values, or expressions might be different in the two unary ex- 
ecutions represented by a single RPFOR execution. The set of expressions and 
commands in RPFOR, €,,C, are generated by the grammars: 


$ 
Er D ep =u | e | (e1le2) Cr D cr = eee, | ex lape, (er) | c | (cilc2) 


where v € Vy, €,€1,€2 € E,C,C1,C2 € C. Values can now be also pairs of unary 
values, that is V, = V U V?. 

To define the semantics for RPFOR, we first extend memories to allow pro- 
gram variables to map to pairs of integers, and array variables to map to pairs 
of arrays. In the following we will use the following projection functions |-|; for 
i € {1,2}, which project, respectively, the first (left) and second (right) elements 
of a pair construct (ie., [(cilc2)|i = ci, [(erle2)|i = e; with |v]; = v when 
v € V), and are homomorphic for other constructs. 

The semantics of expressions in RPFOR is specified through the following 
judgment (m1,™m2,e,p1,P2) tre (V, p1 ph), where m1,m2 E€ M, p1, P2, ph, ph €E 
P,e € €,,v € V, Similarly, for commands, we have the following judgment 
(m1, M2, C, p1, P2} re (M4, Mh, c, p1, ph). Again, we use the predicate Final(-) 
for configurations (m1, M2, C, p1, p2) such that c = skip, and lift the predicate 
to sets of configurations as well. Intuitively a relational probabilistic concrete 
configuration (m1, M2, c, p1, p2) denotes a pair of probabilistic concrete states, 
that is a pair of subdistributions over the space of concrete memories. In Figure 
Bla selection of the rules defining the judgments is presented. Most of the rules 
are quite natural. Notice how branching instructions combine both probabilistic 
and relational nondeterminism. 


r-if-conc-conc-true-false 
(m1, M2, €, p1, p2) Jre (v, pi, p2) lv] lv] EZ [v] >0 Lvjo <0 


(mi, m2, if e then cı else C2, p1, P2) re (mi, m2, (lejal [c2]2), p1, p2) 


r-if-prob-prob-true-false 

(m1, M2, e, p1, p2) Ire (V, Pi, P2) luli, [u]2 E€ Xp 
(mı, M2, if e then cı else c2, p1, P2} re 

(mı, ma, (Ler Jıllez]2), Lula > O@pi, |v]2 < 0@p3) 


r-pair-step 
{ij} = {1,2} (Lm]i, ci, pi) re (mi, Ci, Pi) 
= G Pj = Pj m; = |m]; 


(mı, M2, (cı|c2), p1, p2) re (mi, m3, (cilea), Pi, Pa) 


Fig. 5: RPFOR selected rules 
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So, as in the case of PFOR, we collect sets of relational configurations using 
the judgment & >, R’ with R,R'’ E P(M x M xC, x P x P), defined by 
only one rule: 


SUB-PDISTR-STEP 
(M1, M2, C, p1, p2) E R 


Re = {(m‘,,m, c, pi, po) | (M1, M2, C, P1, P2) rc (mi M3, C, p1, Po) } 


A! = (2\ {omma epi pa)}) UA 


R >r R' 


This rule picks and remove non deterministically one relational configuration 
from a set and adds to it all those configurations that are reachable from 
it. As mentioned before a run of a program in RPFOR corresponds to the 
execution of two runs the program in PFOR. Before making this precise we 
extend projection functions to relational configurations in the following way: 
| (m1, M2, ¢,p1,p2)|i = (Mi, c, pi), for i € {1,2}. Projection functions extend in 
the obvious way also to sets of relational configurations. We are now ready to 
state the following lemma relating the execution in RPFOR to the one in PFOR: 


Lemma 5. Leti E€ {1,2} then R =>* R’ iff (R ]i >? (R' Ji 


5 Symbolic languages 


In this section we lift the concrete languages, presented in the previous section, 
to their symbolic versions (respectively, SPFOR and SRPFOR) by extending them 
with symbolic values X € X. We use intentionally the same metavariables for 
symbolic values in ¥ and Xp since they both represent symbolic values of some 
sort. However, we assume 1,1 4 = f - this is because we want symbolic values 
in ¥ to denote only unknown sets of integers, rather than sets of probability 
distributions. So, the meaning of X should then be clear from the context. 


5.1 SPFOR 


SPFOR expressions extend PFOR expressions with symbolic values X € ¥ Com- 
mands in SPFOR are the same as in PFOR but now symbolic values can appear 
in expressions. 

In order to collect constraints on symbolic values we extend configurations 
with set of constraints over integer values, drawn from the set S (Figure ba), 
not to be confused with probabilistic path constraints (Figure (6b). The former 
express constraints over integer values, for instance parameters of the distribu- 
tions. In particular constraint expressions include standard arithmetic expres- 
sions with values being symbolic or integer constants, and array selection. Prob- 
abilistic path constraints now can also contain symbolic integer values. Hence, 
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sra = Y Èlape, (ce) 
sre :=n|X |Y |rere 
SP:=Y=relre>0|re<0|ra 


Se De::=n |X |ile@Gel lel 
store(e, e,e) | select(e, e) 
S3Isiu=Tleoe |sAs|n7s|Vi.s 
(a) Symbolic constraints. X € X, n € V. (b) Prob. constraints. ce € 5,X € 
XY EX, 


Fig. 6: Grammar of constraints 


probabilistic path constraints now can be symbolic. This is needed to address ex- 
amples branching on probabilistic values, such as the Above Threshold algorithm 
we discussed in Section [2] 

Memories can now contain symbolic values and we represent arrays in mem- 
ory as pairs (X, v), where v is a (concrete or symbolic) integer value representing 
the length of the array, and X is a symbolic value representing the array content. 
The content of the arrays is kept and refined in the set of constraints by means of 
the select(-,-) and store(-,-,-) operations. The semantics of expressions is cap- 
tured by the judgment (m, e, p, s) {sp (v, p’, s’) including now a set of constraints 
over integers. The rules of the judgment are fully described in the extended ver- 
sion [1]. We briefly describe a selection of the rules. Rule S-P-Op-2 applies 
when an arithmetic operation has both of its operands that reduce respectively 
to elements in X,. Appropriately it updates the set of probabilistic constraints. 
Rules S-P-Op-5 instead fires when one of them is an integer and the other is 
a symbolic value. In this case only the list of symbolic constraints needs to be 
updated. Finally, in rule S-P-Op-6 one of the operands reduces to an element 
in Xp and the other to an element in X. We only update the list of probabilistic 
constraints appropriately, as integer constraints cannot contain symbols in 4). 

The semantics of commands of SPFOR is described by small step seman- 
tics judgments of the form: (m,c,p,s) sp (m’,c,p’,s’), including a set of 
constraints over integers. We provide a selection of the rules in Figure |7| Rule 
S-P-If-sym-true fires when a branching instruction is to be executed and the 
guard is reduced to either an integer or a value in 4, denoted by the set Vig. 
In this case we can proceed with the true branch recording in the set of integer 
constraints the fact that the guard is greater than 0. Rule S-P-If-prob-false 
handles a branching instruction which has a guard reducing to a value in Vp. In 
this case we can proceed in both branches, even though here we only show one 
of the two rules, by recording the conditioning fact on the list of probabilistic 
constraints. Finally, rule S-P-Lap-Ass handles probabilistic assignment. After 
having reduced both the expression for the mean and the expression for the scale 
to values we check that those are both either integers or symbolic integers, if 
that’s the case we make sure that the scale is greater than 0 and we add a prob- 
abilistic constraints recording the fact that the modified variable now points to 
a probabilistic symbolic value related to a Laplace distribution. 

The semantics of SPFOR has two sources of nondeterminism, from guards 
which reduce to symbolic values, and from guards which reduce to a probabilistic 
symbolic value. The collecting semantics of SPFOR, specified by judgments as 
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S-P-If-sym-true S-P-If-prob-false 
(m, e, P, s) {sp (v,p’, s’) VE Vis (m, e, P, s) {sp (v, p', s’) vE Xp 
(m,if e then cu else cp, p, s) >sp (m,if e then cu else cy, p, s) >sp 
(m, cu, p', s U {v > 03) (m, eg, p' @[v < 0], s’) 
S-P-Lap-Ass 


(m, €a, P, 5) {sp (va, p’, s') (m, eb, P s’) {sp (vo, p”, s”) X fresh(&%p) 


n 


Va, Vb E€ Vis s” = s” U {u > 0} p” = p"@Q[X Flap», (va)] 


(m, re lape, (ea), p, 8) >sr (mia X], skip, p”, s’”) 


Fig. 7: SPFOR: Semantics of commands (selected rules) 


HE =sp H” (for sets of configurations Jẹ and C’) takes care of both of them. 
The rule for this judgment form is: 


s-p-collect 
Dis) CH # = { (m, d, p', s) | I(m, c, p, s) € Dis) $-t. 
(m, c, p, 8) >se (m,e, p',s') \SAT(s')} 
He => sp (H \ Dis) U HE 


Unlike in the deterministic case of the rule Set-Step, where only one configura- 
tion was chosen nondeterministically from the initial set, here we select nonde- 
terministically a (maximal) set of configurations all sharing the same symbolic 
constraints. The notation Djs C # means that Ø is the maximal subset of con- 


Dis 
figuration in J€ which have s as set of constraints. We use Je ==> Je when 


S 

we want to make explicit the set of symbolic configurations, Dis], that we are us- 
ing to make the step. Intuitively, s-p-collect starts from a set of configurations 
and reaches all of those that are reachable from it - all the configurations that 
have a satisfiable set of constraints and are reachable from one of the original 
configurations with only one step of the symbolic semantics. Notice that in a set 
of constraints we can have constraints involving probabilistic symbols, e.g. if the 
i-th element of an array is associated with a random expression. Nevertheless, 
the predicate SAT(-) does not need to take into consideration relations involving 
probabilistic symbolic constraints but only relations involving symbolic values 
denoting integers. The following lemma of coverage connects PFOR with SPFOR 
ensuring that a concrete execution is covered by a symbolic one. 


Dis 
Lemma 6 (Probabilistic Unary Coverage). If #6 == H ando Er Dis] 
sp 
then Jo’, Dis C H such that o' =z Dis, and o(Dis}) +f o (Dis). 


5.2 SRPFOR 


The language presented in this section is the the symbolic extension of the con- 
crete language RPFOR. It can also be seen as the relational extension of SPFOR. 
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The key part of this language’s semantics will be the handling of the probabilis- 
tic assignment. For that construct we will provide 2 rules instead of one. The 
first one is the obvious one which carries on a standard symbolic probabilistic 
assignment. The second one will implement a coupling semantics. The syntax 
of the SRPFOR, presented in Figure [3] extends the syntax of RPFOR by adding 
symbolic values. The main change is in the grammar of expressions, while the 
syntax for commands is almost identical to that of RPFOR. 


Ers D Csr = es | (esles) | Csr ® esr | alesr] 


$ 
Crs D Csr = Cs | (Cs|Cs) | CsrjCsr | L-Esr | alesr]-esr | x lape, (esr) | 
if esr then Csr else Csr | for (x in esr:€sr) do Csr od 


Fig. 8: SRPFOR syntax. es € Es, Cs € Cs. 


As in the case of RPFOR, only unary symbolic expressions and commands are 
admitted in the pairing construct. This invariant is maintained by the seman- 
tics rules. As for the other languages, we provide a big-step evaluation seman- 
tics for expressions whose judgments are of the form (m1, mo, e, p1, P2,8) Lsrp 
(v, p1, ph, 8’). The only rule defining the judgment {spp is S-R-P-Lift and it is 
presented in the extended version fi]. The rule projects the symbolic relational 
expression first on the left and evaluates it to a unary symbolic value, potentially 
updating the probabilistic symbolic constraints and the symbolic constraints. It 
then does the same projecting the expression on the right but starting from the 
potentially previously updated constraints. Now, the only case when the value 
returned is unary is when both the previous evaluation returned equal integers, 
in all the other cases a pair of values is returned. So, the relational symbolic 
semantics leverages the unary semantics. For the semantics of commands we use 
the following evaluation contexts to simplify the exposition: 


CTX ::= |- ]|CTX;c 
P i= (sel) | Cao | Ch) | Gsel se) 


Notice how P gets saturated by pairs of commands. Moreover, we separate com- 
mands in two classes. We call synchronizing all the commands in Crs with the 
following shapes clap, (e1), (xe lape, (e1) |z'Č lape; (e1)), since they allow syn- 
chronization of two runs using coupling rules. We call non synchronizing all the 
other commands. 


Semantics of non synchronizing commands We consider judgments of the 
form (m1, M2, C, P1, P2, 8) >sre (M4, M3, C, pi, ph, 8’) and a selection of the rules 
is given in Figure [9} An explanation of the rules follows. Rule s-r-if-prob-prob- 
true-false fires when evaluating a branching instruction. In particular, it fires 
when the guard evaluates on both side to a probabilistic symbolic value. In 
this case the semantics can continue with the true branch on the left run and 
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s-r-if-prob-prob-true-false 
(mi, ease Pl P2» s) {sre (v, P1; P2» s) [v] 1; [v] 2€ Xp 
pı =Pi@llvji > 0) p2 = p2@[lv]o < 0] 


(mı, m2, if e then cu else cy, p1,p2,8) snp (m1, Mo, (|eu|i|leg]2), pi, p25 8’) 


s-r-if-prob-sym-true-false 
(mı, mM2, €, p1, P2, s) {spp (v, P1, P2, s') 
luj E Xp luje  pr=pi@llvi>oO}  s=s"Uflvl2 < 0} 


A H / wy 
(mı, m2, if e then Cu else Cf, p1, P2; 8) — SRP (m1, M2, Ctt, P1, P2, S ) 


s-r-pair-lap-skip 


$ A 
(mi, x+lape, (ea), Pi; s) — sp (mi, skip, pi, s’) 


$ 2 : ` 
(mi, M2, (x4 lape, (ea)|skip), p1, p2, s) —sRP (mi, m2, (skip|skip), pi, p2, s’) 


s-r-pair-lapleft-sync 


$ 
c$ ačlapa (ea) P=) (M2, p2, 8) >se (M2, €, po, 5’) 


$ $ 
(mı, m2, P(alape, (ea), c), P1, P2, s) —sRP (mi, md, (x<—lape, (ea)\c’), p1, P2, s’) 


s-r-pair-ctxt-1 
$ ONT 
xe lape, (ea) € {e1,co} Hercz} =2 {1,2} = {i,j} 
A = fi Di = pi 
m, = mi (Mj, Cj, Pj, S) sp (m4, C4, ph, s") 


(mi, m2, P(c, c2), P1, P2, s) — SRP (mi, m3, P(c, Co), P1, D2, s’) 


s-r-pair-ctxt-2 
P Fz (+) (mi, M2, (cilc2), p1, p2, s) — SRP (mi, m3, (cilc2), Pi, P2, s’) 


(mı, m2, P (c1, c2), p1, p2, s) — SRP (mi, m3, P(ci,c2), Pi, Po, 8’) 


Fig.9: SRPFOR: Semantics of non synchronizing commands. Selected rules. 


with the false branch on the right one. Notice that commands are projected 
to avoid pairing commands appearing in a nested form. Rule s-r-if-prob-sym- 
true-false applies when the guard of a branching instruction evaluates to a 
probabilistic symbolic value on the left run and a symbolic integer value on the 
right one. The rule allows to continue on the true branch on the left run and on 
the false branch on the right one. Notice that in one case the probabilistic list 
of constraints is updated, while on the other the symbolic set of constraints. 
Rule s-r-pair-lap-skip handles the pairing command where on the left hand 
side we have a probabilistic assignment and on the right a skip instruction. 
In this case, there is no hope for synchronization between the two runs and 
hence we can just perform the left probabilistic assignment relying on the unary 
symbolic semantics. Rule s-r-pair-lapleft-sync instead applies when on the left 
we have a probabilistic assignment and on the right we have another arbitrary 
command. In this case we can hope to reach a situation where on the right run 
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another probabilistic assignment appears. Hence, it makes sense to continue the 
computation in a unary way on the right side. Again —>srp is a nondeterministic 
semantics. The nondeterminism comes from the use of probabilistic symbols and 
symbolic values as guards, and by the relational approach. So, in order to collect 
all the possible traces stemming from such nondeterminism we define a collecting 
semantics relating set of configurations to set of configurations. 

The semantics is specified through a judgment of the form: SR Sop SR’, 
with SR, SR’ €P(Megp X Msp X Crs x SP x SP x S). The only rule defining 
the judgment is the following natural lifting of the one for the unary semantics. 


s-r-p-collect 

= Ris] CSR SR! = {(m1; Mm, C, ph Po, S) | 

J(mi, Ma, C, P1, P2, s) E Ris] s.t. (mı, Mə, C, P1, P2, s) SRP (mi mh, c, Ph, Ph, s') 
ASAT(s’)} 


SR => srp (SR \ Ris) USR' 


The rule, and the auxiliary notation 2gs], is pretty similar to that of SPFOR, the 
only difference is that here sets of symbolic relational probabilistic configurations 
are considered instead of symbolic (unary) probabilistic configurations. 


Semantics of synchronizing commands We define a new judgment with 
form 4 ~ 4’, with 8,8’ € P(P(Mep x Mop X Crs X SP x SP x S)). In Figure 
we give a selection of the rules. Rule Proof-Step-No-Sync applies when no 
synchronizing commands are involved, and hence there is no possible coupling 
rule to be applied. In the other rules, we use the variable ee to symbolically 
count the privacy budget in the current relational execution. The variable gets 
increased when the rule Proof-Step-Lap-Gen fires. This symbolic counter vari- 
able is useful when trying to prove equality of certain variables without spending 
more than a specific budget. This rule is the one we can use in most cases when 
we need to reason about couplings on the Laplace distributions. In the set of sets 
of configurations 4, a set of configurations, FR , is nondeterministically chosen. 
Among elements in FA a configuration is also nondeterministically chosen. Us- 
ing contexts we check that in the selected configuration the next command to 
execute is the probabilistic assignment. After reducing to values both the mean 
and scale expression, and verified (that is, assumed in the set of constraints) that 
in the two runs the scales have the same value, the rule adds to the set of con- 
straints a new element, that is, E” = E’ + ||vali— [va]2| - K’, where K, Kk’, E” 
are fresh symbols denoting integers and F” is the symbolic integer to which the 
budget variable cee maps to. Notice that ce needs to point to the same symbol 
in both memories. This is because it is a shared variable tracking the privacy 
budget spent so far in both runs. This new constraint increases the budget spent. 
The other constraint added is the real coupling relation, that is X,; + K = Xo. 
Where X 1, X2 are fresh in X. Later, K will be existentially quantified in order 
to search for a proof of ¢€-indistinguishability. 

Rule Proof-Step-Avoc does not use any coupling rule but treats the sam- 
ples in a purely symbolic manner. It intuitively asserts that the two samples are 
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Proof-Step-No-Sync 


SREG — PR Hap PR!’ GB! = (G\ {PR}) {PR} 


g ~ G' 
Proof-Step-No-Coup 


(m, m2, CTX [x4 lape, (€a)]; p1, p215) € FR EG 
(m1, M2, €a, P1, P2, 8) {srp (Va, P1, P2, Sa) 


(m1, M2, €b, P1, P2 Sa) {srp (v0, Pi, P2, Sb) 
X1, X2 fresh(%,) m= mala +> Xə] 


A mi = mile => Xı] 
W” ps! = py@[X2F lapi, Ja (lva]2)] 


pi” = pt OX Ë lapua (lva) p7 
SR! = | PR\ {(m1, m2, CTX |xŻÈlape, (ea)], pr, p2, 8)} )U 


{(m', m3,CTX [skip], p pY} = (s \ (23) U{SR’} 
Gru G! 


Proof-Step-Avoc 


(m1, m2,CTX [xe lape, €a)|,P1;P2,8) ESR EG 
(m1, M2, €a, P1, P2, S) {sRp (Va, P1, P2, Sa) 
(m1, M2, eb, Pi, P2, Sa) ȚsrRPe (Ve; Pi» P2, Sb) 
Xı, Xə fresh(&) m, = miz > X] Ms = moļx > Xə] 
g' = (G \ {SR}) U {IR} 
(PR \ {(rm1, m2, CTX [xË lape, (ea)]; pı, p2, )}) 
U{ (m1, m2, CT X[skip], pi, p2, s”)} 
g a Gg’ 


SR' = 


Proof-Step-Lap-Gen 


$ 
(mı, m2, CT X [rA lape, (ea)], p1, P2, s) € SR € 4 
(M1, M2, €a, P1, P2, 8) {srp (Va, P1, P2, Sa) 


; (mı, Mə, eb, Pi; P2, Sa) sep (vp, Pi; p3, Sp) 
s = sp U {|va]1 = [vo]2, [voli > OF 
1 


mı(ec) = E’ = m3 (ee) 
E”, Xı, X2, K, K' fresh(X) mi = mi|e > Xıllee = E”] 
ms = mole = Xaļlee E”] m(e) = E 
s” =s U{X+K=X,K<K,K -E= lve li, 
E” = E’ +||vaJi — |val2|- K} 
PU’ = OX Šlapio ji (vals) p? : 


P2 = pz @[X2 4 laPivsja (lv al2)] 
8 = (G \ {IR} U{IR} 
(PR \ {(m1, ma, CTX [zË lape, (ea)], p1, p2, 8)}) 


U{(m1, m2, CT X [skip], pi’, ps’, s”)} 


SR' = 


Gm» G’ 


Fig. 10: SRPFOR: Proof collecting semantics, selected rules 
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drawn from the distributions and assigns to them arbitrary integers free to vary 
on the all domain of the Laplace distribution. 

Finally, rule Proof-Step-No-Coup applies to synchronizing commands as 
well. It does not add any relational constraints to the samples. This rules intu- 
itively means that we are not correlating in any way the two samples. Notice 
that since we are not using any coupling rule we don’t need to check that the 
scale value is the same in the two runs as it is requested in the previous rule. We 
could think of this as a way to encode the relational semantics of the program 
in an expression which later can be fed in input to other tools. 

The main difference with the previous rule is that here we treat the sampling 
instruction symbolically and that is why the fresh symbols are in Xp, denoting 
subdistributions, rather than in ¥, denoting sampled integers. When the pro- 
gram involves a synchronizing command we basically fork the execution when 
it is time to execute it. The set of configurations allow us to explore different 
paths, one for every rule applicable. 


6 Metatheory 
The coverage lemma can be extended also to the relational setting. 


Ris 
Lemma 7 (Probabilistic Relational Coverage). If IR == IR and 


srp 
o Fr Rs then Io', Rysy E SR’ such that Rysy C SR', o Hz Ris, and 
ao (Ris) > a (Ris). 


This can also be extended to ~ if we consider only the fragment that only uses 
the rules Proof-Step-No-Sync, and Proof-Step-No-Coupl. 

The language of relational assertions ®,W... is defined using first order pred- 
icate logic formulas involving relational program expressions and logical variables 
in LogVar. The interpretation of a relational assertions is naturally defined as a 
subset of Me x Me, the set of pairs of memories modeling the assertion. We will 
denote by [-]. the substitution function mapping the variables in an assertion to 
the values they have in a memory (unary or relational). More details are in (10). 


* 
rp 


Definition 4. Let &,W be relational assertions, c € Cr, T : LogVar > R be an 
interpretation defined on e. We say that, ® yields W through c within e under T 
(and we write T+ c: 8 5Y) iff 


1. Hmn, Mn, c, (], (|, llm tT} ~* 4 
2. IHsr = {Hes1,..., Hs} E G such that Final(Hsr) and 


V(m1, M2, Skip, pı, p2,8) E Ugey. D- Jk. s => [Y Aec < elimi|m) where 


r, 


mz = (mn|mn) = (mh lee > Öllm}, [ee ++ O]), mz, and mi, are fully 


symbolic memories, and k= kı, k2,... are the symbols generated by the 
rules for synchronizing commands. 


The idea of this definition is to make the proof search automated. When proving 
differential privacy we will usually consider W as being equality of the output 
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variables in the two runs and ® as being our preconditions. We can now prove 
the soundness of our approach. 


Lemma 8 (Soundness). Let ce C,. STF ce: Di ~ D2 5 o1 = o3 then c is 
c-differentially private. 


We can also prove the soundness of refutations obtained by the semantics. 


Lemma 9 (Soundness for refutation). Suppose that we have a reduction 
{{{(m1, m2, [], [], [Plm im) S, and Hs E€ H € 8 and, Jo Hz s such 
that Ae([le]i]e(o(mı)), [|cla]c(a(m2))) > 0 then c is not differentially private. 


7 Strategies for counterexample finding 


Lemma|9]is hard to use to find counterexamples in practice. For this reasons we 
will now describe three strategies that can help in reducing the effort in coun- 
terexample finding. These strategies help in isolating traces that could poten- 
tially lead to violations. For this we need first some notation. Given a set of con- 
straints s we define the triple Q = (Q,,2,C(k)) = (lsi, [s]o,8\(|s]1U|s]2)). 
We sometimes abuse notation and consider (2 also as a set of constraints given 
by the union of its first, second and third projection, and we will also consider a 
set of constraints as a single proposition given by the conjunction of its elements. 
The set C (k) contains relational constraints coming from either preconditions 
or invariants or, from the rule Proof-Step-Lap-Gen. The potentially empty 
vector k = K 1,--- Kn is the set of fresh symbols K generated by that rule. In 
the rest of the paper we will assume the following simplifying assumption. 
Assumption 1 Consider c € C, with output variable o, then c is such that 
{{{(m1, ma, c, [], [J], 5) }}} ~* 8 and VYHNI, C(k), R2) E H E€ G.Final(H)Ao, = 
0g => h S Lo. 


This assumption allow us to consider only programs for which it is necessary 
for the output variable on both runs to assume the same value, that the two 
runs follow the same branches. That is, if the two output differ then the two 
executions must have, at some point, taken different branches. 

The following definition will be used to distinguish relational traces which 
are reachable on one run but not on the other. We call these traces orthogonal. 
Definition 5. A final relational symbolic trace is_orthogonal when its set of 


constraints is such that 30.0o EF 22 and o = Qı AC(k). That is a trace for which 
=(2,\ C(k) => M) is satisfiable. 


The next definition, instead, will be used to isolate relational traces for which 
it is not possible that the left one is executed but the right one is not. We call 
these traces specular. 


Definition 6. A final relational symbolic trace is specular if 3k. \C(k) => 
MRa. 
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The constraint 2, A C(k) includes all the constraints coming from the left pro- 
jection’s branching of the symbolic execution and all the relational assumptions 
such as the adjacency condition, and all constraints added by the potentially fired 
Proof-Step-Lap-Gen rule. A specular trace is such that its left projection con- 
straints plus the relational assumptions imply the right projection constraints. 
We will now describe our three strategies. 


Strategy A In this strategy CRSE uses only the rule Proof-Step-Avoc for 
sampling instructions, also this strategy searches for orthogonal relational traces. 
Under assumption [I] if this happens for a program then it must be the case that 
the program can output one value on one run with some probability but the 
same value has 0 probability of being output on the second run. This implies 
that for some input the program has an unbounded privacy loss. To implement 
this strategy CRSE looks for orthogonal relational traces (m1, m2, skip, p1, p2, 2) 
such that: 3s.o H Q, A C(k) but o j Rz. Notice that using this strategy k will 
always be empty, as the rule used for samplings does not introduce any coupling 
between the two samples. 


Strategy B This strategy symbolically executes the program in order to find 
a specular trace for which no matter how we relate, within the budget, the 
various pairs of samples X}, XŻ in the two runs - using the relational schema 
Xi+K; = XŻ - the postcondition is always false. That is CRSE looks for specular 
relational traces (m1, m2, skip, p1, p2, 2) such that: Vk.[((Q, \C(k) => 22) A 
[Ec < €)] imi lma)] = [oi # o2] (m|me)- 


Strategy C This strategy looks for relational traces for which the output 
variable takes the same value on the two runs but too much of the budget 
was spent. That is CRSE looks for traces (m1, m2, skip, p1, p2, 2) such that: 
Vk [Q1 A C(k) A Qo => loi = o2] (m1|m2)] = [ec > | Goria 

Of the presented strategies only strategy A is sound with respect to coun- 
terexample finding, while the other two apply when the algorithm cannot be 
proven differentially private by any combination of the rules. In this second case 
though, CRSE provides counterexamples which agree with other refutation ori- 
ented results in literature. This strategies are hence termed useful because they 
amount to heuristics that can be applied in some situations. 


8 Examples 


In this section we will review the examples presented in Section and variations 
thereof to show how CRSE works. 


Unsafe Laplace mechanism: Algorithm [4] This algorithm is not e-d.p be- 
cause the noise is a constant and it is not calibrated to the sensitivity r of 
the query q. This translates in any attempt based on coupling rules to use too 
much of the budget. This program has only one possible final relational trace: 
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(m1, M2, skip, p1, po, (21, C(k, §22))). Since there are no branching instructions 
Q, = {[2E]ı > 0} and Rə = Ý, where mi(€) = ma(e) = E. Since there is one 
sampling instruction C(k) will include {Qa -Qa2| < R,P1+K = Po, E, =| K | 
-K'- FE, Oy = Pi +Qai, O2 = P2+Qag}, with m1(0) = O1, m2(0) = O2, m1 (€c) = 
m2(€c) = E.,m;(p;) = Pi. Intuitively, given this set of constraints, if it has to 
be the case that O; = O2 then, Qai — Qa, = K. But Qai — Qa, can be R and 
hence, Ee is at least R. So, if we want to equate the two output we need to spend 
r times the budget. Any relational input satisfying the precondition will give us 
a counterexample, provided the two projections are different. 


A safe Laplace mechanism. By sub- 


Algorithm 4 stituting line 2 in Algorithm with 


A buggy Laplace mechanism pe lapre(0) we get an e-DP algorithm. In- 


deed when executing that line CRSE would 
generate the following constraint Pı +o = 
PoA | Kyo +0 -0 |< Kı AO, = Vi +Pi A 


Input: q: D > Z, D: Die: RT 
Output: o: {true, false} 


Precondition | i ke 

Di ~ Do > |g(D1) — q(D2)| < r O2 = Və + P2. Which by instantiating K = 

Postcondition 01 = 02 0, Kı = Və = Vi implies O71 = Oz A Ee < E. 
1: ve q(D) Unsafe sparse vector implementation: 
2: p & lop) Algorithm [2| We already discussed why 
3: 0o utp this algorithm is not e-differentially private. 
As returno Algorithm []satisfies Assumption [I]because 


it outputs the whole array o which takes 
values of the form Lt, t or L” for 1 < i < n and t € R. The array, hence, encodes 
the whole trace. So if two runs of the algorithm output the same value it must 
be the case that they followed the same branching instructions. Let’s first notice 
that the algorithm is trivially e differentially private, for any €, when the number 
of iterations n is less than or equal to 4. 

Indeed it is enough to apply 
the sequential composition theo- 
rem and get the obvious bound 
ion 

CRSE can prove this by ap- 
plying the rule Proof-Step- 


A 
4 
fa 
Pa 
Ae 
e 


ENN Lap-Gen n times, and then 
'F A d JON choosing K1,..., Kn all equal to 
© 00o ọọ ® 0. This would imply the statement 
E E of equality of the output variables 
0; = [L*, 51] 


spending less than e. A potential 
Fig. 11: Two runs of Alg. B] counterexample can be found in 

5 iterations. If we apply strategy 

B to this algorithm and follow the relational symbolic trace that applies the 
rule Proof-Step-Lap-Gen for all the samplings we can isolate the relational 
specular trace shown in Figure [11] which corresponds to the left execution fol- 
lowing the false branch for the first four iterations and then following the true 


Coupled Relational Symbolic Execution for Differential Privacy 229 


branch and setting the fifth element of the array to the sampled value. Let’s 

denote the respective final relational configuration by (m1, M2, skip, p1, p2, $). 

The set of constraints is as follows: s = (Q,C(k),@) = Ti > S!,T, 

S? Tı > ron Tı > S$, Tı < S$} {T + ko = Tə, S1 + kı = SaS + ko 
4 


lV 


53,53 + kg = S3, SÍ + ka = 94, 9 + ks = S3, Es = kog + $) ki... h {T > 
i=1 
S}, To > 92,T> > 93, T2 > 83, Tə < 93}) with mı (€c) = mole.) = 
IS}... 193], malo) = [S4,. .. , 83], m (t) = Ti, m(t) = To. 7 p 
We can see that strategy B applies, because we have — Yk.[(Q1 A C(k) => 
92) A fee < Əimilmaə] => foi Æ o2|imıjm2); Computing the probability 
associated with these two traces we can verify that we have a counterexample. 
This pair of traces is, in fact, the same that has been found in for a slightly 
more general version of Algorithm (2). Strategy B selects this relational trace 
since in order to make sure that the traces follow the same branches, the coupling 
rules enforce necessarily that the two samples released are different, preventing 
CRSE to prove equality of the output variables in the two runs. 


E6,m1(0) = 


Unsafe sparse vector implementation: Algorithm Also this algorithm 
satisfies Assumption [I] The algorithm is e-differentially private for one iteration. 
This is because, intuitively, adding noise to the threshold protects the result of 
the query as well at the branching instruction, but only for one iteration. The 
algorithm is not e-differentially private, for any finite € already at the second 
iteration, and a witness for this can be found using CRSE. We can see this using 
strategy B. Thanks to this strategy we will isolate a relational orthogonal trace, 
similarly to what has been found in for the same algorithm. CRSE will unfold 
the loop twice, and it will scan all relational traces to see if there is an orthogonal 
trace. In particular, the relational trace that corresponds to the output 0, = 
02 = |L, T], that is the the trace with set of constraints (Q,,C(k), Q2) = ({T > 
qar: Tı < Qar}, {lara — Gia2| < 1, ldai — Q2a2| < 1HT2 > qua2, T2 < Q2a2}). 
Since the vector K is empty we can omit it and just write C. It is easy to see now 
that the following sigma: o = [q1a1 > 0, q2a1 | L, qia2 > 1, q2a2 > 0], proves 
that this relational trace is orthogonal: that is o = 2, A C, but o E Rə. 
Indeed if we consider two inputs D,, D2 and two queries q1,q2 such that: 
qı (Dı) = q2(D2) = 0, go(D1) = qi(D2) = 1 we get that the probability of 
outputting the value o = [1,T] is positive in the first run, but it is 0 on the 
second. Hence, the algorithm can only be proven to be oo-differentially private. 


A safe sparse vector implementation. Algorithm [2] can be proven e-d.p if 
we replace o[i]<-T to line 7. Let us consider a proof of this statement for n = 5. 
CRSE will try to prove the following postconditions: 0, = [T,L,...,1] => 
og =[T,1,...,L]) Ae. <e,...,0, =[L,...,1,T]) = om =[L,...,L,T]A 
€c < e. The only interesting iteration will be the i-th one, in all the others the 
postcondition will be vacuously true. Also, the budget spent will be ko 5, the one 
spent for the threshold. For all the other sampling instruction we can spend 0 by 
just setting k; = q[j](D2) — q[j](D1) for j # i, that is by coupling 8; + kj = 89, 
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with kj = alj|(D2) — alj](D1), spending |k; + alj](D2) — alj](D1)| = 0. So, at 
the i-th iteration the samples are coupled 8, +k; = §2, with k; = 1. So if §, > tı 
then also 82 > to, and also, if §; < f; then also 8 < ty. This implies that at 
th i-th iteration we enter on the right run the true branch iff we enter the true 
branch on the left one. This by spending |k; + q[i](D2) — g[i](D1)|¢ < 25. The 
total privacy budget spent will then be equal to e. 


9 Related Works 


There is now a wide array of formal techniques for reasoning about differen- 
tial privacy, e.g. 26] [27]. We will discuss here the 
techniques that are closest to our work. In the authors devised a synthe- 
sis framework to automatically discover proofs of privacy using coupling rules 
similar to ours. However, their approach is not based on relational symbolic 
execution but on synthesis technique. Moreover, their framework cannot be di- 
rectly used to find violations of differential privacy. In the authors devise a 
decision logic for differential privacy which can soundly prove or disprove differ- 
ential privacy. The programs considered there do not allow assignments to real 
and integer variables inside the body of while loops. While their technique is 
different from our, their logic could be potentially integrated in our framework 
as a decision procedure. In the recent concurrent work [23], the authors propose 
an automated technique for proving or finding violations to differential privacy 
based on program analysis, standard symbolic execution and on the notion of 
randomness alignment, which in their approach plays the role that approximate 
coupling plays for us here. Their approach focuses on efficiency and scalability, 
while we focus here more on the fundational aspects of our technique. 

Another recent concurrent work combines testing based on (unary) sym- 
bolic execution with approximate coupling for proving and finding violations to 
differential privacy. Their symbolic execution engine is similar to our SPFOR, 
and is used to reduce the numbers of tests that need to be generated, and for 
building privacy proofs from concrete executions. Their approach relies more 
directly on testing, providing an approximate notion of privacy. As discussed in 
their paper this could be potentially mitigated by using a relational symbolic 
execution engine as the one we propose here, at the cost of using more complex 
constraints. Another related work is (15), proposing model checking for finding 
counterexamples to differential privacy. The main difference with our work is in 
the basic technique and in the fact that model checking reason about a model 
of the code, rather than the code itself. They also consider the above threshold 
example and they are able to handle only a finite number of iterations. 

Other work has studied how to find violations to differential privacy through 
testing (5}(6). The approaches proposed in ag differ from ours in two ways: 
first, they use a statistical approach; second, they look at concrete values of the 
data and the privacy parameters. By using symbolic execution we are able to 
reason about symbolic values, and so consider e-differential privacy for any finite 
e. Moreover, our technique does not need sampling - although we still need to 
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compute distributions to confirm a violation. Our work can be seen as a proba- 
bilistic extension of the framework presented in [10], where sampling instructions 
in the relational symbolic semantics are handled through rules inspired by the 
logic apRHLT [3]. This logic can be used to prove differential privacy but does 
not directly help in finding counterexamples when the program is not private. 


10 Conclusion 


We presented CRSE: a symbolic execution engine framework integrating rela- 
tional reasoning and probabilistic couplings. The framework allows both prov- 
ing and refuting differential privacy. When proving CRSE can be seen as strong 
postcondition calculus. When refuting CRSE uses several strategies to isolate 
potentially dangerous traces. Future work includes interfacing more efficiently 
CRSE with numeric solvers to find maximums of ratios of probabilities of traces. 
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Abstract. Deductive verification techniques based on program logics 
(i.e., the family of Floyd-Hoare logics) are a powerful approach for pro- 
gram reasoning. Recently, there has been a trend of increasing the ex- 
pressive power of such logics by augmenting their rules with additional 
information to reason about program side-effects. For example, general 
program logics have been augmented with cost analyses, logics for proba- 
bilistic computations have been augmented with estimate measures, and 
logics for differential privacy with indistinguishability bounds. In this 
work, we unify these various approaches via the paradigm of grading, 
adapted from the world of functional calculi and semantics. We propose 
Graded Hoare Logic (GHL), a parameterisable framework for augment- 
ing program logics with a preordered monoidal analysis. We develop a 
semantic framework for modelling GHL such that grading, logical asser- 
tions (pre- and post-conditions) and the underlying effectful semantics 
of an imperative language can be integrated together. Central to our 
framework is the notion of a graded category which we extend here, in- 
troducing graded Freyd categories which provide a semantics that can in- 
terpret many examples of augmented program logics from the literature. 
We leverage coherent fibrations to model the base assertion language, 
and thus the overall setting is also fibrational. 


1 Introduction 


The paradigm of grading is an emerging approach for augmenting language se- 
mantics and type systems with fine-grained information [40]. For example, a 
graded monad provides a mechanism for embedding side-effects into a pure lan- 
guage, exactly as in the approach of monads, but where the types are aug- 
mented (“graded”) with information about what effects may occur, akin to 
a type-and-effect system [24,42]. As another example, graded comonadic type 
operators in linear type systems can capture non-linear dataflow and proper- 
ties of data use [7,16,44]. In general, graded types augment a type system with 
some algebraic structure which serves to give a parameterisable fine-grained pro- 
gram analysis capturing the underlying structure of a type theory or semantics. 
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Much of the work in graded types has arisen in conjunction with categorical 
semantics, in which graded modal type operators are modelled via graded mon- 
ads [13,17,25,36,33], graded comonads (often with additional graded monoidal 
structure) [7,16,25,43,44], graded ‘joinads’ [36], graded distributive laws between 
graded (co)monads [15], and graded Lawvere theories [27]. 

So far grading has mainly been employed to reason about functional lan- 
guages and calculi, thus the structure of the A-calculus has dictated the struc- 
ture of categorical models (although some recent work connects graded monads 
with classical dataflow analyses on CF Gs [21]). We investigate here the paradigm 
of grading instead applied to imperative languages. As it happens, there is al- 
ready a healthy thread of work in the literature augmenting program logics 
(in the family of Floyd-Hoare logics) with analyses that resemble notions of 
grading seen more recently in the functional world. The general approach is to 
extend the power of deductive verification by augmenting program logic rules 
with an analysis of side effects, tracked by composing rules. For example, work 
in the late 1980s and early 1990s augmented program logics with an analysis of 
computation time, accumulating a cost measure [37,38], with more recent fine- 
grained resource analysis based on multivariate analysis associated to program 
variables [8]. As another example, the Union Bound Logic of Barthe et al. [5] 
defines a Hoare-logic-style system for reasoning about probabilistic computa- 
tions with judgments Fg c: ¢ => w for a program c annotated by the maximum 
probability 6 (the union bound) that 7 does not hold. The inference rules of 
Union Bound Logic track and compute the union bound alongside the standard 
rules of Floyd-Hoare logic. As a last example, Approximate Relational Hoare 
Logic [2,6,39,48] augments a program logic with measures of the e- bounds for 
reasoning about differential privacy. 

In this work, we show how these disparate approaches can be unified by 
adapting the notion of grading to an imperative program-logic setting, for which 
we propose Graded Hoare Logic (GHL): a parameterisable program logic and 
reasoning framework graded by a preordered monoidal analysis. Our core con- 
tribution is GHL’s underlying semantic framework which integrates grading, 
logical assertions (pre- and post-conditions) and the effectful semantics of an 
imperative language. This framework allows us to model, in a uniform way, the 
different augmented program logics discussed above. 

Graded models of functional calculi tend to adopt either a graded monadic or 
graded comonadic model, depending on the direction of information flow in the 
analysis. We use the opportunity of an imperative setting (where the A-calculus’ 
asymmetrical ‘many-inputs-to-one-output’ model is avoided) to consider a more 
flexible semantic basis of graded categories. Graded categories generalise graded 
(co)monadic approaches, providing a notion of graded denotation without im- 
posing on the placement (or ‘polarity’) of grading. 


Outline Section 2 begins with an overview of the approach, focusing on the exam- 
ple of Union Bound Logic and highlighting the main components of our semantic 
framework. The next three sections then provide the central contributions: 
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— Section 3 defines GHL and its associated assertion logic which provides a 
flexible, parameterisable program logic for integrating different notions of 
side-effect reasoning, parameterised by a preordered monoidal analysis. We 
instantiate the program logic to various examples. 

— Section 4 explores graded categories, an idea that has not been explored much 
in the literature, and for which there exists various related but not-quite- 
overlapping definitions. We show that graded categories can abstract graded 
monadic and graded comonadic semantics. We then extend graded categories 
to Freyd categories (generally used as a more flexible model of effects than 
monads), introducing the novel structure of graded Freyd categories. 

— Section 5 develops the semantic framework for GHL, based on graded Freyd 
categories in a fibrational setting (where coherent fibrations [22] model the 
assertion logic) integrated with the graded Freyd layer. We instantiate the 
semantic model to capture the examples presented in Section 3 and others 
drawn from the literature mentioned above. 


An extended version of this paper provides appendices which include further 
examples and proof details [14]. 


2 Overview of GHL and Prospectus of its Model 


As discussed in the introduction, several works explore Hoare logics combined 

with some form of implicit or explicit grading for program analysis. Our aim is 

to study these in a uniform way. We informally introduce of our approach here. 
We start with an example which can be derived in Union Bound Logic [5]: 


Fo.05 {T} dow, + Gauss(0, 1); do v2 + Gauss(0, 1); v := max(v1, v2) {v < 2} 


This judgment has several important components. First, we have primitives for 
procedures with side-effects such as do vı + Gauss(0, 1). This procedure samples 
a random value from the standard normal distribution with mean 0 and variance 
1 and stores the result in the variable vı. This kind of procedure with side 
effects differs from a regular assignment such as v:=max(v1, v2), which is instead 
considered to be pure (wrt. probabilities) in our approach. 

The judgment has grade ‘0.05’ which expresses a bound on the probability 
that the postcondition is false, under the assumption of the precondition, after 
executing the program; we can think of it as the probability of failing to guarantee 
the postcondition. In our example (call it program P), since the precondition 
is true, this can be expressed as: Prypy(m)[v > 2] < 0.05 where [P](m) is the 
probability distribution generated in executing the program. The grade of P in 
this logic is derived using three components. First, sequential composition: 


Fe {Y} Pi {dit Fø {vi} Po {0} 
Keser {Y} Pi; Po {0} 


which sums the failure probabilities. Second, an axiom for Gaussian distribution: 


Fo.o25 {T } dov + Gauss(0, 1) {v < 2} 


Graded Hoare Logic and its Categorical Semantics 237 


with a basic constant 0.025 which comes from the property of the Gaussian distri- 
bution we are considering. Third, by the following judgment which is derivable 
by the assignment and the consequence rules, which are the ones from Hoare 
Logic with a trivial grading 0 which is the unit of addition: 


Fo {v1 < 2 V vg < 2} v := max (v1, v2) {v < 2} 


Judgments for more complex examples can be derived using the rules for condi- 
tional and loops. These rules also consider grading, and the grading can depend 
on properties of the program. For example the rule for conditionals is: 


Fe {Y Ae = tt} Pi {9} Fe tbe, = tf} Po to} 
tg {w} if e, then P; else P> {¢} 


This allows one to reason also about the grading in a conditional way, through 
the two assumptions w A e, = tt and Y% ^ ep = ff. We give more examples later. 

Other logics share a similar structure as that described above for the Union 
Bound logic, for example the relational logic apRHL [2], and its variants [48,49], 
for reasoning about differential privacy. Others again use a similar structure 
implicitly, for example the Hoare Logic to reason about asymptotic execution 
cost by Nielson [37], Quantitative Hoare Logic [8], or the relational logic for 
reasoning about program counter security presented by Barthe [3]. 

To study the semantics of these logics in a uniform way, we first abstract the 
logic itself. We design a program logic, which we call Graded Hoare Logic (GHL), 
containing all the components discussed above. In particular, the language is a 
standard imperative language with conditional and loops. Since our main focus 
is studying the semantics of grading, for simplicity we avoid using a ‘while’ loop, 
using instead a bounded ‘loop’ operation (loope do P). This allow us to focus 
on the grading structures for total functions, leaving the study of the interaction 
between grading and partiality to future work. The language is parametric in the 
operations that are supported in expressions—common in several treatments of 
Hoare Logic—and in a set of procedures and commands with side effects, which 
are the main focus of our work. GHL is built over this language and an assertion 
logic which is parametric in the basic predicates that can be used to reason about 
programs. GHL is also parametric in a preordered monoid of grades, and in the 
axioms associated with basic procedures and commands with side effects. This 
generality is needed in order to capture the different logics we mentioned before. 

GHL gives us a unified syntax, but our real focus is the semantics. To be 
as general as possible we turn to the language of category theory. We give a 
categorical framework which can capture different computational models and 
side effects, with denotations that are refined by predicates and grades describ- 
ing program behaviours. Our framework relates different categories (modelling 
different aspects of GHL) as summarized by the following informal diagram (1). 


ee ee (1) 
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This diagram should not be understood as a commutative diagram in CAT as 
“is a graded category and hence not an object of CAT. 

The category V models values and pure computations, the category C models 
impure computations, P is a category of predicates, and E is a graded category 
whose hom-sets are indexed by grades—elements of a preordered monoid. The 
presentation of graded categories is new here, but has some relation to other 
structures of the same name (discussed in Section 4). 

This diagram echos the principle of refinement as functors proposed by 
Melliés and Zeilberger [32]. The lower part of the diagram offers an interpreta- 
tion of the language, while the upper part offers a logical refinement of programs 
with grading. However, our focus is to introduce a new graded refinement view. 
The ideas we use to achieve this are to interpret the base imperative language 
using a Freyd category I : Y —> C (traditionally used to model effects) with 
countable coproducts, to interpret the assertion logic with a coherent fibration 
p : P > V, and to interpret GHL as a graded Freyd category I : P > E with 
homogeneous coproducts. In addition, the graded category E has a functor? q 
into C which erases assertions and grades and extracts the denotation of effectful 
programs, in the spirit of refinements. The benefit of using a Freyd category as 
a building block is that they are more flexible than other structures (e.g., mon- 
ads) for constructing models of computational effects [47,51]. For instance, in 
the category Meas of measurable spaces and measurable functions, we cannot 
define state monads since there are no exponential objects. However, we can still 
have a model of first-order effectful computations using Freyd categories [46]. 

Graded Freyd categories are a new categorical structure that we designed 
for interpreting GHL judgments (Section 4.2). The major difference from an 
ordinary Freyd category is that the ‘target’ category is now a graded category (E 
in the diagram (1)). The additional structure provides what we need in order to 
interpret judgments including grading. 

To show the generality of this structure, we present several approaches to in- 
stantiating the categorical framework of GHL’s semantics, showing constructions 
via graded monads and graded comonads preserving coproducts. 

Part of the challenge in designing a categorical semantics for GHL is to 
carve out and implement the implicit assumptions and structures used in the 
semantics of the various Hoare logics. A representative example of this challenge 
is the interpretation of the rule for conditionals in Union Bound Logic that we 
introduced above. We interpret the assertion logic in (a variant of) coherent 
fibrations p : P — Y, which model the AV4=-fragment of first-order predicate 
logic [22]. In this abstract setup, the rule for conditionals may become unsound as 
it is built on the implicit assumption that the type Bool, which is interpreted as 
1+1, consists only of two elements, but this may fail in general V. For example, 
a suitable coherent fibration for relational Hoare logic would take Set? as the 
base category, but we have Set?(1,1+1) & 4, meaning that there are four global 
elements in the interpretation of Bool. We resolve this problem by introducing 


5 More precisely, this is not quite a functor because E is a graded category; see Defi- 
nition 9 for the precise meaning. 
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a side condition to guarantee the decidability of the boolean expression: 


Fm {Y A ey = tt} Pi {a} Fm {Ww A en = ff} Po {10} Virtp=— tt Ve, = ff 
Fm {Y} if e then P; else P> {¢} 


This is related to the synchronization condition appearing in the relational Hoare 
logic rule for conditional commands (e.g., [6]). 

Another challenge in the design of the GHL is how to assign a grade to the 
loop command loope do P. We may naively give it the grade m; = Vien m', 
where m is the grade of P, because P is repeatedly executed some finite number 
of times. However, the grade m; is a very loose over-approximation of the grade 
of loope do P. Even if we obtain some knowledge about the iteration count e 
in the assertion logic, this cannot be reflected in the grade. To overcome this 
problem, we introduce a Hoare logic rule that can estimate a more precise grade 
of loope do P, provided that the value of e is determined: 


YO<2<N. Fm {V1} P{vz} Yn Fen =[N] 
Fm {Wn} loop en do P {po} 


This rule brings together the assertion language and grading, creating a depen- 
dency from the former to the latter, and giving us the structure needed for a 
categorical model. The right premise is a judgment of the assertion logic (un- 
der program variables Ty and pre-condition Yy) requiring that e is statically 
determinable as N. This premise makes the rule difficult to use in practical ap- 
plications where e is dynamic. We expect a more “dependent” version of this 
rule is possible with a more complex semantics internalizing some form of data- 
dependency. Nevertheless, the above is enough to study the semantics of grading 
and its interaction with the Hoare Logic structure, which is our main goal here. 


3 Loop Language and Graded Hoare Logic 


After introducing some notation and basic concepts used throughout, we out- 
line a core imperative loop language, parametric in its set of basic commands 
and procedures (Section 3.2). We then define a template of an assertion logic 
(Section 3.3), which is the basis of Graded Hoare Logic (Section 3.4). 


3.1 Preliminaries 


Throughout, we fix an infinite set Var of variables which are employed in the 
loop language (as the names of mutable program variables) and in logic (to 
reason about these program variables). 

A many-sorted signature X is a tuple (5,O,ar) where S,O are sets of sorts 
and operators, and ar : O > S* assigns argument sorts and a return value sort 
to operators (where St is a non-empty sequence of sorts, i.e., an operator o with 
signature (s1 X... X Sn) —> s is summarized as ar(o) = (81,..-,8n,8) € St). We 
say that another many-sorted signature X” = (.$”,O’,ar’) is an extension of X 
if S C S’ and O C O’ and ar(o) = ar’(o) for allo € O. 
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Let X = (S,---) be a many-sorted signature. A context for X is a (possibly 
empty) sequence of pairs I’ € (Var x S')* such that all variables in T are distinct. 
We regard I’ as a partial mapping from Var to S. The set of contexts for X is 
denoted Ctxy. For s € S and I’ € Ctxy, we denote by Expy (I, s) the set of 
-expressions of sort s under the context I. When »’, I are obvious, we simply 
write e : s to mean e € Exps(J,s). This set is inductively defined as usual. 

An interpretation of a many-sorted signature X = (S,O,ar) in a cartesian 
category (V,1, x) consists of an assignment of an object [|s] € V for each sort 
s € S and an assignment of a morphism [o] € V([si] x -+ x [sn], [s]) for 


each o € O such that ar(o) = (s1,...,8n,8). Once such an interpretation is 
given, we extend it to expressions in the standard way (see, e.g. [9,45]). First, 
for a context [ = 2 : $1,:++ ,Zn : Sn E Ctxy, by |I] we mean the product 


[s1] x- -+x [sn]. Then we inductively define the interpretation of e € Expy (I, s) 
as a morphism [e] € V([J/], [s]). 


Throughout, we write bullet-pointed lists marked with x for the mathematical 
data that are parameters to Graded Hoare Logic (introduced in Section 3.4). 


3.2 The Loop Language 


We introduce an imperative language called the loop language, with a finite 
looping construct. The language is parameterised by the following data: 


x amany-sorted signature X = (S, O, ar) extending a base signature (So, Oo, aro) 
of sort So = {bool, nat} with essential constants as base operators Oo, shown 
here with their signatures for brevity rather than defining aro directly: 


Oo = {tt : bool, ff : bool} U {[k] : nat | k € N} 


where bool is used for branching control-flow and nat is used for controlling 
loops, whose syntactic constructs are given below. We write [k] to mean the 
embedding of semantic natural numbers into the syntax. 

x a set CExp of command names (ranged over by c) and a set PExp, of 
procedure names of sort s (ranged over by p) for each sort s € S. 


When giving a program, we first fix a context Im for the program variables. We 
define the set of programs (under a context Im) by the following grammar: 


P::=P ; P | skip | v :=e | doc | dov + p | if e» then P else P | loope, do P 


where v € IM, €b, €n are well-typed X-expressions of sort bool and nat under Im, 
and c € CExp. In assignment commands, e € Exps(Iw,I(v)). In procedure 
call commands, p € PExp rw) Each program must be well-typed under Im. 
The typing rules are routine so we omit them. 

Thus, programs can be sequentially composed via ; with skip as the triv- 
jal program which acts as a unit to sequencing. An assignment v := e assigns 
expressions to a program variable v. Commands can be executed through the 
instruction doc which yields some side effects but does not return any value. 
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Procedures can be executed through a similar instruction dov + p which yields 
some side effect but also returns a value which is used to update v. Finally, con- 
ditionals are guarded by a boolean expression e, and the iterations of a looping 
construct are given by a natural number expression en (which is evaluated once 
at the beginning of the loop to determine the number of iterations). 

This language is rather standard, except for the treatment of commands and 
procedures of which we give some examples here. 


Example 1. Cost Information: a simple example of a command is tick, which 
yields as a side effect the recording of one ‘step’ of computation. 

Control-Flow Information: two other simple example of commands are cfTT 
and cfFF, which yield as side effects the recording of either true or false to a 
log. A program can be augmented with these commands in its branches to give 
an account of a program’s control flow. We will use these commands to reason 
about control-flow security in Example 3. 

Probability Distributions: a simple example of a procedure is Gauss(z, y), 
which yields as a side effect the introduction of new randomness in the program, 
and which returns a random sample from the Gaussian distribution with mean 
and variance specified by x,y E€ Im. We will see how to use this procedure to 
reason about probability of failure in Example 4. 


Concrete instances of the loop language typically include conversion functions 
between the sorts in X, e.g., so that programs can dynamically change control 
flow depending on values of program variables. In other instances, we may have a 
language manipulating richer data types, e.g., reals or lists, and also procedures 
capturing higher-complexity computations, such as Ackermann functions. 


3.3 Assertion Logic 


We use an assertion logic to reason about properties of basic expressions. We 
regard this reasoning as a meta-level activity, thus the logic can have more sorts 
and operators than the loop language. Thus, over the data specifying the loop 
language, we build formulas of the assertion logic by the following data: 


x a many-sorted signature X; = (S1, O1, arı) extending X. 

x aset P, of atomic propositions and a function par; : P, + S/ assigning input 
sorts to them. We then inductively define the set Fml», (T) of formulas under 
I’ € Ctxy, as in Figure 1 (over the page), ranged over by w and ¢. 

x a Ctxy,-indexed family of subsets Axiom(I’) C Fmly, (I) x Fmly, (I). 


The assertion logic is a fragment of the many-sorted first-order logic over X4- 
terms admitting: 1) finite conjunctions, 2) countable disjunctions, 3) existential 
quantification, and 4) equality predicates. Judgements in the assertion logic have 
the form I’ | W1,-++ , Yn F o (read as Y1 A++- ^Yn implies ¢), where I’ € Ctxy, 
is a context giving types to variables in the formulas #1,--- ,Wn,¢ E€ Fmly, (T`). 
The logic has the axiom rule deriving I’ | w+ ¢ for each pair (w,@) of formulas 
in Axiom(I). The rest of inference rules of this logic are fairly standard and so 
we omit them (see e.g. [22, Section 3.2 and Section 4.1]). 


242 M. Gaboardi et al. 


The set Fmly, (T`) of formulas under I’ € Ctxy, is inductively defined as follows: 
1. For all p € P, and par;(p) = sı -+ sn and t; : Expy, (I, si) (1 < i < n) implies 
plti, tn) € Fmls, (r) 

. For all s € Sı and t,u € Exp, (T, s), t = u € Fmls, (T`). 

. For all finite families {¢; € Fmly, (T) }iea, we have A ¢; € Fmly, (T). 

. For all countable families {¢; € Fmly,(I)}iea, we have V 4; € Fmly, (T). 

. For all ¢ € Fmly, (I, x: s), we have (3z : s . ¢) € Fmly, (T). 


oe wN 


Fig. 1. Formula formation rules 


In some of our examples we will use the assertion logic to reason about 
programs in a relational way, i.e., to reason about two executions of a program 
(we call them left and right executions). This requires basic predicates to manage 
expressions representing pairs of values in our assertion logic. As an example, we 
could have two predicates eqv/1), €qV;2), that can assert the equality of the left 
and right executions of an expression to some value, respectively. That is, the 
formula eqv,;) (ep, true), which we will write using infix notation e,(1) = true, 
asserts that the left execution of the boolean expression ep is equal to true. 


3.4 Graded Hoare Logic 
We now introduce Graded Hoare Logic (GHL), specified by the following data: 


x a preordered monoid (M, <,1,-) (pomonoid for short) (where - is monotonic 
with respect to <) for the purposes of program analysis, where we refer to 
the elements m € M as grades; 

x two functions which define the grades and pre- and post-conditions of com- 
mands CExp and procedures PExp: 


C: : Fmls, (Im) x M — 20*P 
C5 : Fmls, (Im) x M x Fmls,(r: s) > QPExP, (s€ S Ar ¢ dom(Im)) 


The function Ce takes a pre-condition and a grade, returning a set of command 
symbols satisfying these specifications. A command c may appear in C.(¢,m) for 
different pairs (¢,m), enabling pre-condition-dependent grades to be assigned to 
c. Similarly, the function C$ takes a pre-condition, a grade, and a postcondition 
for return values, and returns a set of procedure names of sort s satisfying these 
specifications. Note, r is a distinguished variable (for return values) not in Im. 
The shape of Ce and Ce as predicates over commands and procedures, indexed 
by assertions and grades, provides a way to link grades and assertions for the 
effectful operations of GHL. Section 3.5 gives examples exploiting this. 

From this structure we define a graded Hoare logic by judgments of the form: 
Fm {4} P {w} denoting a program P with pre-condition ¢ € Fmly, (Im), post- 
condition % € Fmly, (Im) and analysis m € M. Graded judgments are defined 
inductively via the inference rules given in Table 1. Ignoring grading, many of the 
rules are fairly standard for a Floyd-Hoare program logic. The rule for skip is 
standard but includes grading by the unit 1 of the monoid. Similarly, assignment 
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Fm {Y} Pi {vit Fw {vit Po {o} 
1 {Y} skip {4} Emm {V} Pi; Po {9} Fi {Yle/v]} v := e {4} 


f € Cc(,m) pe Co (p,m, e) 
Fin {Y} doc{y} Fm {bd} dou < p {(&v : Im) . Y) Ad[o/r}} 


IMivry msm’ Im|ok o Fm {vy} P {0} 
Fm {W} P {o'} 


VO<z <N. Fm {pz} P{vz} Im | Yn ben = [N] 
Ln {wn} loopen do P {wo} 


En {Y Ae = tt} Pi {9} Err {Y Ae = ff} Po {o} Iu | YF e= tt V e = ff 
Fm {Y} if e then P; else P2 {¢ġ} 


Table 1. Graded Hoare Logic Inference Rules 


is standard, but graded with 1 since we do not treat it specially in GHL. Sequen- 
tial composition takes the monoid multiplication of the grades of the subterms. 
The rules for commands and procedures use the functions C, and Cp introduced 
above. Notice that the rule for commands uses as the pre-condition as its post- 
condition, since commands have only side effects and they do not return any 
value. The rule for procedures combines the pre- and post-conditions given by 
Cp following the style of Floyd’s assignment rule [12]. 

The non-syntax-directed consequence rule is similar to the usual consequence 
rule, and in addition allows the assumption on the grade to be weakened (ap- 
prozimated) according to the ordering of the monoid. 

The shape of the loop rule is slightly different from the usual one. It uses the 
assertion-logic judgment Im | Yn F en = [N] to express the assumption that en 
evaluates to [N]. Under this assumption it uses a family of assertions w, indexed 
by the natural numbers z € {0,1,...,N — 1} to conclude the post-condition yo. 
This family of assertions plays the role of the classical invariant in the Floyd- 
Hoare logic rule for ‘while’. Assuming that the grade of the loop body is m, the 
grade of the loop command is then m% , where m? = 1 and m**+! = m-m*. By 
instantiating this rule with Y, = (0 A en = [z]), the loop rule also supports the 
following derived rule which is often preferable in examples: 


vVO<z <N. Fm {9A en =[z+1]} P {OA e, = [z]} 
bmn {89 Aen = [N]} loopen do P {0 ^ en = [0]} 


The rule for the conditional is standard except for the condition Im | Y F e» = 
tt V ep = ff. While this condition may seem obvious, it is actually important to 
make GHL sound in various semantics (mentioned in Section 2). As an exam- 
ple, suppose that a semantics [—] of expressions is given in the product category 
Set”, which corresponds to two semantics [—]1, [—]2 of expressions in Set. Then 
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the side condition for the conditional is to guarantee that for any boolean ex- 
pression ep, and pair of memories (p1, p2) satisfying the precondition 7, the pair 
(Jeo]1 (1), leo]2(p2)) is either [tt] = (tt, tt) or [ff] = (ff, ff). We note that other 
relational logics such as apRHL [6] employ an equivalent syntactic side condition 
in their rule for conditionals. 


3.5 Example Instantiations of GHL 


Example 2 (Simple cost analysis). We can use the tick command discussed in 
Example 1 to instrument programs with cost annotations. We can then use 
GHL to perform cost analysis by instantiating GHL with the additive natural 
number monoid (N,<,0,+) and tick € C,(¢,1). Thus, we can form judgments 
Fy {p} do tick {¢} which account for cost via the judgment’s grade. Sequential 
composition accumulates cost and terms like skip and assignment have 0 cost. 

Let us use this example to illustrate how Ce can assign multiple pre-condition- 
grade pairs to a command. Suppose that we modify the semantics of tick so 
that it reports unit cost 1 when variable x is 0, otherwise cost 2. We can then 
define Ce so that tick € C,(x = [0], 1) and also tick € C(x # [0], 2). In this 
way, we can give different grades to programs depending on their pre-conditions. 


Example 3 (Program Counter Security). We can use the commands cfTT and 
cfFF discussed in Example 1 to instrument programs with control flow anno- 
tations, recording to an external log. GHL can then be used to reason about 
program counter security [35][3, Section 7.2] of instrumented programs. This is 
a relational security property similar to non-interference (requiring that private 
values do not influence public outputs) but where only programs with the same 
control flow are considered. 

Firstly, any conditional statement if e, then P, else Py in a program is elab- 
orated to a statement if e, then (cfTT; P,) else (cfFF; Py). We then instantiate 
GHL with a monoid of words over {tt, ff} with prefix order: 2* £ ({tt, ff}*, < 
,€,°) and we consider cfTT € C.(¢, tt) and cfTT € Celo, ff). We can thus form 
judgments of the shape Fit {¢} docfTT {d} and Fes {¢} docfFF {¢} which 
account for control-flow information (forming paths) via the judgment’s grade. 
Sequential composition concatenates control-flow paths and terms like skip and 
assignment do not provide any control-flow information, i.e. €. 

We then instantiate the assertion logic to support relational reasoning, i.e., 
where the expressions of the language are interpreted as pair of values. For an 
expression e, interpreted as a pair (v1,v2) then we write e(1) = vı to say that 
the first component (left execution) equals vı and e(2) = vg to say that the 
second component (right execution) equals v2. In the assertion logic, we can 
then describe public values which need to be equal, following the tradition in 
reasoning about non-interference, by the predicate e(1) = e(2). Private data 
are instead interpreted as a pair of arbitrary values. (Section 3.3 suggested the 
notation eqv,,)(e,b) for e(i) = b, but we use the latter for compactness here). 

As an example, one can prove the following judgment where x is a public 
variable and y is a private one, and b € {tt, ff}: 
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Fy {x(1)=a(2)Ax(1)=b}if x then (cfTT; r=1; y=1) else (cf£FF; x=2; y=2) {x(1)=a(2)} 


This judgment shows the program is non-interferent, since the value of x is 
independent from the value of the private variable y, and secure in the pro- 
gram counter model, since the control flow does not depend on the value of y. 
Conversely, the following judgment is not derivable for both b = tt and b= ff: 


Fy {x(1)=a(2)Ay(1)=b} if y then (cf£TT; c=1; y=1) else (cfFF; 7=1; y=2) {x(1)=a(2)} 


This program is non-interferent but is not secure in the program counter model 
because the control flow leaks information about y which is a private variable. 


Example 4 (Union Bound Logic). Section 1 discussed the Union Bound logic by 
Barthe et al. [5]. This logic embeds smoothly into GHL by using the pomonoid 
(Rso, <,0,+) and procedures of the form sample, e as samplings from a prob- 
abilistic distribution u parametrised over the syntax of GHL expressions e. Fol- 
lowing Barthe et al. [5], we consider a semantically defined set for Cp: 


Cpl, 8Y) = {sample ,, e | Vs.s € I) = Prs<[sample,, .](s) [s’ € Iv] < B)} 


This definition captures that, assuming the pre-condition holds for an input 
memory state s, then for output value s’ from sampling sample nee the proba- 
bility that the post-condition is false is bounded above by 8. This allow us to 
consider different properties of the distribution with parameter e. 


4 Graded Categories 


Now that we have introduced GHL and key examples, we turn to the core of its 
categorical semantics: graded categories. 

Graded monads provide a notion of sequential composition for morphisms of 
the form I —> TmJ, i.e., with structure on the target /output capturing some in- 
formation by the grade m drawn from a pomonoid [24]; dually, graded comonads 
provide composition for Dm —> J, i.e. with structure on the source/input with 
grade m [43]. We avoid the choice of whether to associate grading with the input 
or output by instead introducing graded categories, which are agnostic about the 
polarity (or position) of any structure and grading. Throughout this section, we 
fix a pomonoid (M,<,1,-) (with - monotonic wrt. <). 


Definition 1. An M-graded category C consists of the following data: 


— A class Obj(C) of objects. I € C denotes I € Obj(C). 

— A homset C(I, J)(m) for all objects I,J € C and m € M. We often write 
f:I—m J to mean f € C(I, J)(m), and call m the grade of f; 

— An upcast functions h : C(I, J)(m) > C(I, J)(n) for all grades m < n; 

— Identity morphisms id; € C(I, I)(1) for all I € C; 

— Composition o : C(J, K)(n) x C(I, J)(m) > C(I, K)(m-n). 


Graded categories satisfy the usual categorical laws of identity and associativity, 
and also the commutativity of upcast and composition: ty, ‘got f= pea bg (gof), 
corresponding to monotonicity of (-) with respect to <. 
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An intuitive meaning of a graded category’s morphisms is: f € C(A, B)(m) if 
the value or the price of a morphism f : A + B is at most m with respect to the 
ordering < on M. We do not yet give a polarity or direction to this price, i.e., 
whether the price is consumed or produced by the computation. Thus, graded 
categories give a non-biased view; we need not specify whether grading relates 
to the source or target of a morphism. 

Graded categories were first introduced by Wood [54, Section 1] (under the 
name ‘large V-categories’), and Levy connected them with models of call-by- 
push-value [28]. Therefore we do not claim the novelty of Definition 1. 


Example 5. A major source of graded categories is via graded (co)monads. Let 
(M,<,1,-) be a pomonoid, regarded as a monoidal category. A graded monad 
[50,24] on a category C (or more precisely an M-graded monad) is a lax monoidal 
functor (T,7, u) : (M,<,1,-) > ([C, C], Id, o). Concretely, this specifies: 


— a functor T : (M,<) — [C,C] from the preordered set (M,<) to the 
endofunctor category over C. For an ordered pair m < m’ in M then 
T(m <m’):Tm— Tm’ is a natural transformation; 

— a unit 7: Id > T1 and a multiplication Umm : Tm o Tm > T(m-m’), 
natural in m, m’ € M. 


They satisfy the graded versions of the usual monad axioms: 


Hm, m’, Tm" J 
—— > 


TmJ 14 Tm(T1J)  Tm(Tm(Tm"J)) T(m-m')(Tm" J) 


nos | S [emas Prati! | | mms 
Hi,m,. Hm,m’m”,J 
— 


T1(TmJ) ——>+TmJ Tm(T(m -m") J) T(m: m +m") J 


Graded comonads are dually defined (i.e., as a graded monad on C??). 

By mimicking the construction of Kleisli categories, we can construct an 
M-graded category Cr (we call it the Kleisli M-graded category of T) from a 
category C with an M-graded monad T on C.® 


— Obj(Cr) = Obj(C) and Cr(X, Y)(m) = C(X, TmY). 

— For f : X >m Y and n such that m < n, we define t?, f =T(m < n)y of. 

— Identity and composition are defined by: idx ê nx : X 4, X and go f ê 
Hmn,zolmgo f for f : X >m Y andg:Y >n Z. 


The dual construction is possible. Let D be an M°?-graded comonad on a cate- 
gory C. We then define Cp by Cp(X, Y )(m) = C(DmX,Y); the rest of data is 
similar to the case of graded monads. This yields an M-graded category Cp. 


Remark 1. As an aside (included for completeness but not needed in the rest 
of the paper), graded categories are an instance of enriched categories. For the 
enriching category, we take the presheaf category |M, Set], together with Day’s 
convolution product [10]. 


6 Not to be confused with the Kleisli category of graded monads by Fujii et al. [13]. 
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4.1 Homogeneous Coproducts in Graded Categories 


We model boolean values and natural numbers by the binary coproduct 1+ 1 
and the countable coproduct [];<y 1. We thus define what it means for a graded 
category to have coproducts. The following definition of binary coproducts easily 
extends to coproducts of families of objects. 


Definition 2. Let C be an M-graded category. A homogeneous binary coprod- 
uct of X1, X2 € C consists of an object Z € C together with injections 11 € 
C(X1, Z)(1) and vg € C(X2,Z)(1) such that, for any m € M and Y € C, the 
function Af . (for, forg) of type C(Z, Y)(m) > C(X1, Y)(m) x C(X2, Y)(m) is 


invertible. The inverse is called the cotupling and denoted by [—, —]. It satisfies 
the usual law of coproducts (i = 1, 2): 

[fi falou = fi, [41,42] = idz, 

9° lft, fal = [9° fi, 9° fal, [tn Sts Tmf2] = mli fal- 


When homogeneous binary coproducts of any combination of X1, X2 € C exists, 
we say that C has homogeneous binary coproducts. 


The difference between homogeneous coproducts and coproducts in ordinary 
category theory is that the cotupling is restricted to take morphisms with the 
same grade. A similar constraint is seen in some effect systems, where the typing 
rule of conditional expressions require each branch to have the same effect. 


Proposition 1. Let {1; € C(X;, Z)}icr be a coproduct of {X;}ie7 in an ordinary 
category C. 


1. Suppose that T is an M-graded monad on C. Then {nzou;, E€ Cr(X;, Z)(1) bier 
is a homogeneous coproduct in Cr. 

2. Suppose that (D,¢,6) is an M°?-graded comonad on C such that each Dm : 
C > C preserves the coproduct {ti}ier. Then {u;i o €r € Cp(Xi, Z)(1) hier is 
a homogeneous coproduct in Cp. 


4.2 Graded Freyd Categories with Countable Coproducts 


We now introduce the central categorical structure of the loop language and GHL 
semantics: graded Freyd categories with homogeneous countable coproducts. 


Definition 3. An M-graded Freyd category with homogeneous countable co- 
products consists of the following data: 


1. A cartesian monoidal category (V,1,x,l,r,a) with countable coproducts 
such that for all V € V, the functor V x (—) : V > V preserves coproducts. 

2. An M-graded category C such that Obj(C) = Obj(V) and C has homoge- 
neous countable coproducts. 

3. A function Iyw : V(V,W) > C(V,W)(1) for each V,W e€ C. Below we 
may omit writing subscripts of J. The role of this function is to inject pure 
computations into effectful computations. 
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4. A function (*)y.x,wy : V(V,W) x C(X,Y)(m) > C(V x X,W x Y)(m) for 
each V,W,X,Y € C and m € M. Below we use it as an infix operator and 
sometimes omit its subscripts. The role of this function is to combine pure 
computations and effectful computations in parallel. 


The function J and (*) satisfy the following equations: 


I(idx) =idxy I(gof)=Igolf I(fxg)=f*Ig idy+idx =idy.x, 
(go f)» (ioj) =(g*i)o(f*j) f*tng=tnlf * 9) 
fo I(x) = I(lx)o (idi * f) I(axrynz) 0 ((fxg)* h) = (f * (g*h)) o I(ax,y,z) 


These are analogous to the usual Freyd categories axioms. We also require that: 


1. For any countable coproduct {4; € V(Xi,Y)}iea, {() € C(XY)()hea 
is a homogeneous countable coproduct. 

2. For any homogeneous countable coproduct {4; € C(Xi,Y)(1)}iea and V € 
V, {idy x1; € C(V x Xi, VXY)(1) }iea is a homogeneous countable coproduct. 


We denote an M-graded Freyd category with countable coproducts by the tuple 
(Y, 1, x,C, I, (*)) capturing the main details of the cartesian monoidal structure 
of V, the base category C, the lifting function I and the action (+). 


If the grading pomonoid M is trivial, C becomes an ordinary category with 
countable coproducts. We therefore simply call it a Freyd category with count- 
able coproducts. This is the same as a distributive Freyd category in the sense 
introduced by Power [46] and Staton [51]. We will use non-graded Freyd cat- 
egories to give a semantics of the loop language in Section 4.3. An advantage 
of Freyd categories is that they encompasses a broad class of models of com- 
putations, not limited to those arising from monads. A recent such example is 
Staton’s category of s-finite kernels [52]". 

We could give an alternative abstract definition of M-graded Freyd category 
using 2-categorical language: a graded Freyd category is an equivariant morphism 
in the category of actions from a cartesian category to M-graded categories. The 
full detail of this formulation will be discussed elsewhere. 

A Freyd category typically arises from a strong monad on a cartesian category 
[47]. We give here a graded analogue of this fact. First, we recall the notion 
of strength for graded monads [24, Definition 2.5]. Let (C, 1, x) be a cartesian 
monoidal category. A strong M-graded monad is a pair of an M-graded monad 
(T,nņ, p) and a natural transformation stm E€ C(IxTmJ, Tm(Ix J)) satisfying 
graded versions of the four coherence laws in [34, Definition 3.2]. We dually 
define a costrong M-graded comonad (D, e€, ð, cs) to be the M-graded comonad 
equipped with the costrength csr Jm € C(Dm(I x J), I x DmJ). 


Proposition 2. Let (C,1, X) be a cartesian monoidal category. 


1. Let (T, n, p,st) be a strong M-graded monad on C. The Kleisli M-graded 
category Cr, together with If = nw o f and f x g =stwy © (f x g) forms 
an M-graded Freyd category with homogeneous countable coproducts. 


T Tt is not known whether the category of s-finite kernels is a Kleisli category. 
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2. Let (D,e,06,cs) be a costrong M°?-graded comonad on C such that each Dm 
preserves countable coproducts. Then the coKleisli M-graded category Cp 
together with If = f oey and f xg = (f x g)ocsy.x forms an M-graded 
Freyd category with homogeneous countable coproducts. 


We often use the following ‘ext’ operation to structure interpretations of pro- 
grams and GHL derivations. Let ôx € V(X, X x X) be the diagonal morphism. 
Then ext : C(X,Y)(m) > C(X, X x Y)(m) is defined as ext(f) = (X x f)o Idx. 
When viewing X as a set of environments, ext( f) may be seen as executing an ef- 
fectful procedure f under an environment, then extending the environment with 
the return value of f. In a non-graded setting, the definition of ext is analogous. 


4.3 Semantics of The Loop Language in Freyd Categories 


Towards the semantics of GHL, we first give a more standard, non-graded cate- 
gorical semantics of the loop language. We first prepare the following data. 


x A Freyd category (V,1, x,C,JZ,*) with countable coproducts. 
x A coproduct {tt, ff € V(1, Bool)} of 1 and 1 in Y. 

x A coproduct {|k] € V(1, Nat)},en of N-many 1s in V. 

x An interpretation [—] of X in Y such that 


[bool] = Bool [tt] = tt € V(1, Bool) [££] = ff € V(1, Bool) 
[nat] = Nat [fk]] = |k] € VG, Nat). 


For convenience, we let M = [7m] (Section 3.1), i.e., all relevant (mutable) pro- 
gram variables are in scope, and write 7, € V(M,[Im(v)]) for the projection 
morphism associated to a program variable v € I. 

Pure expressions are interpreted as V-morphisms and impure commands and 
procedures are interpreted as C-morphisms, of the form: 


x (expressions) A morphism [e] € V(M, [s]) for all e € Exps(Iw,s); see 
Section 3.1. 

x (commands) A morphism fc] € C(M, 1) for each c € CExp. 

x (procedures) A morphism [p] € C(M, [s]) for each s € S and p € PExp,. 


For the interpretation of programs, we first define some auxiliary morphisms. For 
all v € Iw, let upd, E€ V(M x [Im(v)], M) to be the unique morphism (capturing 
memory updates) satisfying 7,oupd,, = T2 and 7,,oupd,, = 7,07 for any w € Im 
such that v 4 w. We define sub(v,e) € V(M, M) by sub(v, e) = upd, o (idm, [e]), 
which updates the memory configuration at variable v with the value of e. 

For the interpretation of conditional and loop commands, we need coproducts 
over M. Since V is distributive, we can form a binary coproduct M x Bool and a 
countable coproduct M x Nat with injections respectively defined as (Vk € N): 


tm = (idm, tto!m) € V(M,M x Bool) [k] £ (iduy, |kJo!m) € V(M,M x Nat) 
fm = (idm, ffo!m) € V(M, M x Bool) 
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By Condition 1 of Definition 3, these coproducts are mapped to coproducts in 
C with injections: 


{I(tm), (fm) € C(M,M x Bool)}, 4{I([k]) € C(M, M x Nat) | k € N}. 


The cotuplings of these coproducts (written [f, g] and [f(],cy respectively) are 
used next to interpret conditionals and loops. 
We interpret a program P of the loop language as a morphism [P] € C(M, M): 


[P; P] = [P'] o [P] [skip] = idy 
[do v < p] = I(upd,,) o ext[p] [do c] = I(r) o ext[c] 
[vu := e] = I(sub(v, e)) 
[if e» then P else P’] = | [P], [P’]] o ext(U[es]) 
[loop en, do P] = [[P]™ ]ken 0 ext(Z[en]) 


Thus, the semantics of loop e, do P is such that, if the expression e,, evaluates to 
some natural number [k] then loop endoP is equivalent to the k-times sequential 
composition of P. 


5 Modelling Graded Hoare Logic 


We now define the categorical model of GHL, building on the non-graded Freyd 
semantics of Section 4.3. Section 5.1 first models the base assertion logic, for 
which we use fibrations, giving an overview of the necessary mathematical ma- 
chinery for completeness. Section 5.2 then defines the semantics of GHL and 
Section 5.3 instantiates it for the examples discussed previously in Section 3. 


5.1 Interpretation of the Assertion Logic using Fibrations 


Our assertion logic (Section 3) has logical connectives of finite conjunctions, 
countable disjunctions, existential quantification and an equality predicate. A 
suitable categorical model for this fragment of first-order logic is offered by a 
coherent fibration [22, Def. 4.2.1], extended with countable joins in each fibre. 
We recap various key definitions and terminology due to Jacobs’ textbook [22]. 

In the following, let P and Y be categories and p : P > V a functor. 

We can regard functor p as attaching predicates to each object in V. When 
pw = X, we regard w € P as a predicate over X € V. When f € P(v,¢) isa 
morphism, we regard this as saying that pf maps elements satisfying w to those 
satisfying ¢ in V. Parallel to this view of functors assigning predicates is the 
notion that entities in P are ‘above’ those in V when they are mapped to by p. 


Definition 4 (‘Aboveness’). An object y € P is said to be above an object 
X € V if py = X. Similarly, a morphism® f € P(w,¢) is said to be above a 
morphism f in Y if pf = f € V(py, pọ). A morphism in P is vertical if it is 
above an identity morphism. Given 7), ¢ € P and f € V(pv, pọ), then we denote 
the set of all morphisms in P above f as P(Y, ġ) = {f © P(w,¢) | pf =f}. 


8 The dot notation here introduces a new name and should not be understood as 
applying some mathematical operator on f. 
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Definition 5 (Fibre category). A fibre category over X € Y is a subcategory 
of P consisting of objects above X and morphisms above idx. This subcategory 
is denoted by Px, and thus the homsets of Px are Px (a, ¢) = Pia, (Y, ¢). 


We are ready to recall the central concept in fibrations: cartesian morphisms. 


Definition 6 (Cartesian morphism). A morphism fe P(Y, ġ) is cartesian 
if for any a € P and g € V(pa, pw), the post-composition of f in P, regarded as 
a function of type fo — : Pj(a,v) > P op f(s), is a bijection. This amounts 
to the following universal property of cartesian morphism: for any h € P(a,¢) 
above g o pf, there exists a unique morphism g € P(a, p) above g such that 
h = f og. Intuitively, f represents the situation where w is a pullback or inverse 
image of ¢ along p Í , and the universal property corresponds to that of pullback. 


Definition 7 (Fibration). Finally, a functor p : P > V is a fibration if for any 
p € P, X € V, and f € V(X, pv), there exists an object ¢ € P and a cartesian 
morphism f € P(¢,v) above f, called the cartesian lifting of f with y. We say 
that a fibration p : P > V is posetal if each Px is a poset, corresponding to the 
implicational order between predicates. When w < ¢ holds in Px, we denote the 
corresponding vertical morphism in P as Y Z @. 

Posetal fibrations are always faithful. The cartesian lifting of f € V(X, pw) 
with ~ uniquely exists. We thus write it by fw, and its domain by f*w. It 
can be easily shown that for any morphism f € V(X,Y) in V, the assignment 
we Py + f*y € Px extends to a monotone function f* : Py — Px. We call it 
the reindexing function (along f). Furthermore, the assignment f +> f* satisfies 
the (contravariant) functoriality: id% = idp, and (go f)* = f*og"*. A fibration is 
a bifibration if each reindexing function f* : Py > Px for f € V(X,Y) has a left 
adjoint, denoted by fẹ : Px > Py. fw is always associated with a morphism 
fw: fey > y above f, and this is called the opcartesian lifting of f with w. For 
the universal property of the opcartesian lifting, see Jacobs [22, Def. 9.1.1]. 


Fibrations for our Assertion Logic It is widely known that coherent fibrations 
are suitable for interpreting the A, V, J, =-fragment of first-order logic (see [22, 
Chapter 4, Def. 4.2.1]). Based on this fact, we introduce a class of fibrations that 
are suitable for our assertion logic—due to the countable joins of the assertion 
logic we modify the definition of coherent fibration accordingly. 


Definition 8. A fibration for assertion logic over V is a posetal fibration p : 
P > Y for cartesian V with distributive countable coproducts, such that: 


1. Each fibre poset Px is a distributive lattice with finite meets Tx,/A and 
countable joins Lx,V. 

2. Each reindexing function f* preserves finite meets and countable joins. 

3. The reindexing function Cy along the contraction cx y â (T1, T2, T2) € 
V(X xY, X xY x Y) has a left adjoint Eqy y 4 cy y. This satisfies Beck- 
Chevalley condition and Frobenius property; we refer to [22, Definition 3.4.1]. 
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4. The reindexing function w% y along the weakening wx,y Sm, €V(XxXY,X) 
has a left adjoint 3x y 4 Wy y- This satisfies Beck-Chevalley condition and 
Frobenius property; we refer [22, Definition 1.9.1, 1.9.12]. 


This is almost the same as the definition of coherent fibrations [22, Definition 
4.2.1]; the difference is that 1) the base category V has countable coproducts 2) 
we require each fibre to be a poset; this makes object equalities hold on-the-nose, 
and 3) we require each fibre to have countable joins. They will be combined with 
countable coproducts of V to equip P with a countable coproduct [22]. 


Example 6. A typical example of a fibration for assertion logic is the subobject 
fibration p5°%t : Pred — Set; the category Pred has objects pairs (X,~) of 
sets such that y C X, and morphisms of type (X,Y) —> (Y,¢) as functions 
f:X —Y such that f(y) C ¢. The functor p sends (X, y) to X and f to itself. 
More examples can be found in the work of Jacobs [22, Section 4]. 


For a parallel pair of morphisms f,g € V(X, Y), we define the equality pred- 
icate Eq(f,g) above X to be (idx, f,g)*Eqy y(Txxy) [22, Notation 3.4.2]. In- 
tuitively, Eq(f,g) corresponds to the predicate {x € X | f(x) = g(x)}. In this 
paper, we will use some facts about the equality predicate shown by Jacobs [22, 
Proposition 3.4.6, Lemma 3.4.5, Notation 3.4.2, Example 4.3.7]. 


The Semantics of Assertion Logic We move to the semantics of our assertion 
logic in a fibration p : P > Y for assertion logic. The basic idea is to interpret 
a formula y € Fmly,(I’) as an object in Pīr], and an entailment I” | Y F @ as 
the order relation [y] < [@] in Pyrj. The semantics is given by the following 
interpretation of the data specifying the assertion logic (given in Section 3.3): 


x A fibration p : P > Y for assertion logic. 

x An interpretation [—] of X, in P that coincides with the one [—] of X in Y. 
x An object [P] € Ptpar(pyj for each atomic proposition P € P, (recall par 
assigns input sorts to atomic propositions in P;, parameterising the logic). 

x We require that for any l € Ctxy, and (~,¢) € Axiom(I’), Iv] < [¢] 
holds in P,ry. This expresses an implicational axiom in the coherent logic. 


The interpretation [y] of y € Fmls, (T) is inductively defined as a Pjpj-object: 


[Pn te) = Mah Een) TP] ES ul = Ea, Tul) 
[Avid =A] Vds Vid Bees. yl = dirik] 


5.2 Interpretation of Graded Hoare Logic 


We finally introduce the semantics of Graded Hoare logic. This semantics inter- 
prets derivations of GHL judgements Fm {Y} P {¢} as m-graded morphisms in 
a graded category. Moreover, it is built above the interpretation [P] € C(M, M) 
of the program P in the non-graded semantics introduced in Section 4.3. The 
underlying structure is given as a combination of a fibration for the assertion 
logic and a graded category over C, as depicted in (1) (Section 2, p. 237). 
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Definition 9. A GHL structure over a Freyd category (V,1,x,C,J,*) with 
countable coproducts and a fibration p : P > Y for assertion logic comprises: 


1. An M-graded Freyd category (P, i, x,E,I,®) with homogeneous countable 
coproducts. 

2. A function qy.¢m : E(~,¢)(m) > C(py, pp) (subscripts may be omitted), 
which maps to the base denotational model, erasing assertions and grades. 


The above data satisfy the following properties: 


1. That q behaves ‘functorialy’ preserving structure from E to YV: 


q(idg) =idps, a(gof)=a9°Gf, dy,om(Tmf) = Ww,omf 
qf) =I(pf), af ®g)= pf * qg 


2. For any homogeneous countable coproduct {u; € E(yi, )(1)hea, {qui € 
C(pu;, pd) }iea is a countable coproduct. 
3. (Ex falso quodlibet) q1 x,ọ,m : E(Lx,¢)(m) > C(X, pd) is a bijection. 


The last statement asserts that if the precondition is the least element Lx in 
the fibre over X € Y, which represents the false assertion, we trivially conclude 
any postcondition ¢ and grading m for any morphisms of type X —> pọ in C. 

The semantics of GHL then requires a graded Freyd category with countable 
coproducts, and morphisms in the graded category guaranteeing a sound model 
of the effectful primitives (commands/procedures), captured by the data: 


x A GHL structure (P, i, x, E, İ, ®, q) over the Freyd category (V, 1, x,C, J, *) 
with countable coproducts and the fibration p : P > V for assertion logic. 

x For each c € Ce(y, m) a morphism (c) € E([¢], 1)(m) such that qlc) = fel]. 

x For each pE C$ (Y, m, ¢) a morphism (p) € E([4], [¢])(m) such that q(p) = [p]. 


where |c], [p] and later [e] are from the underlying non-graded model (Sec. 4.3). 
We interpret a derivation of GHL judgement Fm {¢} P {~} as a morphism 


[Fm {9} P {oH € Eel, [Y]))(m) such that gpgj.pyjmlFm {o} P {v} = [P]. 


The constraint on the right is guaranteed by the soundness of the interpretation 
(Theorem 1). From the functor-as-refinement viewpoint [32], the interpretation 
[Hm {4} P {w}] witnesses that |P] respects refinements ¢ and w of M, and 
additionally it witnesses the grade of |P] being m. We first cover the simpler 
cases of the interpretation of GHL derivations: 
[Fi {4} skip {4}] = idpyy 
[Fmim 1V} Pi ; Po {O}] = [Fm {V1} Po 10} 0 [rms {¥} Pi (Yt ] 
[= {le/v]} v =e {4}] = L(sub(v, e)fy]) 
[Fm {V} doe {Y} = L(m) o ext(c) 
[Fm {V} dow + p{(av . Y) A p} = L(upd, (TV] x [e])) © ext(p) 


lm 1} P {0H = Eol 7 fe’) oth [rm {0} P AH 0 Tv’) Ze) 
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The morphisms with upper and lower lines are cartesian liftings and op-cartesian 
liftings in the fibration p : P — V of the assertion logic. The codomain of the 
interpretation of the procedure call dov + p is equal to |(3v . Y) A @]. 

The above interpretations largely follow the form of the underlying model 
of Section 4.3, with the additional information and underlying categorical ma- 
chinery for grades and assertions here; we now map to E. The interpretation of 
conditional and loop commands requires some more reasoning. 


Conditionals Let pı, p2 be the interpretations of each branch of the conditional 
command: 


Pi = [Fm {HA eo = tt} Pi {o}] € Elfy A e = tt], fel) n) 
p2 = [Km {WA eo = ff} Po {o}] € Elfy A e = ff], fel) COn) 


We consider the cocartesian lifting (idm, Jee) [y] : [y] —> (idm, [eo])«[v]. We 
name its codomain Im. Next, cartesian morphisms tm(Im) : tm*lm — Im and 
fm(Im) : fm*Im — Im in P are above the coproduct (M x Bool, tm, fm) in V. Then 
the interpretations of the preconditions of P,, P) are inverse images of Im along 
tm, fm: M —> M x Bool: 


Lemma 1. [y A e, = tt] = tm*Im and |Y A e = ff] = fm*Im. 


The side condition of the conditional rule ensures that (Im, tm(Im), fm(Im)) is a 
coproduct in P: 


Lemma 2. Im | YE ep = tt V e = ff implies Im = tm,tm*|Im V fm,fm*Im. 


Therefore the image of the coproduct (Im,tm(Im), fm(Im)) by I yields a homo- 
geneous coproduct in E. We take the cotupling [p1, p2] € E(Im,[@])(m) with 
respect to this homogeneous coproduct. We finally define the interpretation of 
the conditional rule to be the following composite: 


[Fm {V}if ep then P, else Po{9}] = [p1, p2jol ((idu, leol) 1) € EV], [¢]) (mm). 


Loops Fix N € N, and suppose that Fm {pi+1} P {yi} is derivable in the graded 
Hoare logic for each 0 < i < N. Let p; € E([ypi+1], [y:]) (m) be the interpretation 
[Fm {Wisi} Pi {yi }]. We then define a countable family of morphisms (we use 
here ex falso quodlibet): 


meS [Eaman PI EElLm [vol m") (+N) 
iT poo opn € E(fvw], Wom") (=N) 


Let 6; = cod(b;). Then [Jey 8i = Vienli}<0i = [N] [Yn] because [i]+6; is either 
Lmxnat or [N] [Yn]. We then send the coproduct 6; > J Jien 8i by İ and obtain 
a homogeneous coproduct in E. By taking the cotupling of all b; with this ho- 
mogeneous coproduct, we obtain a morphism [bijen E€ E(N] [vy], [Vol (m). 


Lemma 3. Iu | Wn H En = [N] implies (idm, len lY] = IN] [en]. 
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We then define [+,,~ {wy} loopen do P {yo}] = bilien 0 T((idmy, [en]) [Yn ])- 
Theorem 1 (Soundness of GHL). For any derivation of a GHL judgement 
Fr {o} P {v}, we have diel, Iy],m [Fm {9} P {v}] = [P]. 


5.3 Instances of Graded Hoare Logic 


We first present a construction of GHL structures from graded monad liftings, 
which are a graded version of the concept of monad lifting [11,19,26]. 


Definition 10. [Graded Liftings of Monads] Consider two cartesian categories 
č and C and a functor q: E —> C strictly preserving finite products. We say 
that a strong M-graded monad (T, ù, imm’, Stm) on E is an M-graded lifting 
of a strong monad (7,77, ,st) on C along q if qo Tm=Togq, aliy) = Ngy, 


(imme) = How, aL On < ma)y) = id, q(Sty,ġ,m) = Stgy,qo- 


Theorem 2. Let V be cartesian category with distributive countable coproducts, 
and let p: P — V be a fibration for assertion logic. Let T be a strong monad 
on V and T be an M-graded lifting of T along p. Then the M-graded Freyd 
category (P,1, x, Pi, J,®) with homogeneous countable coproducts, together with 


the function qy ọm: Pp(v,¢)(m) > Vr(py, pd) defined by qy ¢,m(f) = pf is a 
GHL structure over (V,1,x,Vr,JI,*) and p. 


Before seeing examples, we introduce a notation and fibrations for the asser- 
tion logic. Let p : P — V be a fibration for the assertion logic. Below we use the 
following notation: for f € V(I, J) and w € Pr and ¢ € Pj, by f : Y>¢ we 
mean the statement “there exists a morphism f € P(w,¢) such that pf = f”. 
Such Í is unique due to the faithfulness of p : P > Y. 


Example 7 (Example 4: Union Bound Logic). To derive the GHL structure suit- 
able for the semantics of the Union Bound Logic discussed in Example 4, we 
invoke Theorem 2 by letting p be p5°t : Pred — Set (Example 6), T be the 
subdistribution monad D and T be the strong (R>o0, <, 0, +)-graded lifting U of 
D defined by U(6)(X, P) ê (D(X), {d | d(X \ P) < 5}). The induced GHL struc- 
ture is suitable for the semantics of GHL for Union Bound Logic in Example 
4. The soundness of inference rules follow from the GHL structure as we have 
showed in Section 5.2. To complete the semantics of GHL for the Union Bound 
Logic, we give the semantics (p) of procedures p € C5. Example 4 already gave 
a semantic condition for these operators: 


CSCO, 8, ¥) 
= {sample e | Vs.s € [Ø] => Pree fsampie, .J(s) [8 € DYI < F)} 
= {sample e | [sample,, e] € Predy([¢], [¥])(4)} 
For any sample, e € Cp(¢, 2, Y), the interpretation (sample, e) is [sample,, e]. 


Example 8 (Example 3: Program Counter Security). To derive the GHL struc- 
ture suitable for GHL with program counter security, we invoke Theorem 2 with: 
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— The category ERel of endorelations defined as follows: an object (X, R) is a 
pair of X € Set and R C X x X (i.e. an endorelation R on X) and an arrow 
f: (X, R) > (Y,S) is a function f: X > Y such that (f x f)(R) CS. 

— The fibration for the assertion logic e : ERel —> Set given by (X, R) > X and 
fro f. 

— The writer monad W,X = X x {tt, ff }* on Set with the monoid of bit strings. 

— The strong 2*-graded lifting of W, along e : ERel — Set, given by 
W.o(X, R) = (W:X, {((a,0’), (y,0’)) | (ey) E RAO! <o}). 


The derived GHL structure is suitable for the semantics of GHL in Example 3. 
To complete the structure of the logic, we need to interpret two commands 
cfTT, cfFF € CExp and set the axioms of commands Ce. 

First [cfTT], [cfFF]: [M] > 1 in ERel,;,, are defined by [cfTT] = (*, tt) and 
[cfFF] = (x, ff). Finally, we define Ce by (recall < is prefix ordering of strings): 


C.(w,o) = {cfTT | tt < o} U{cfFF | ff < o}. 


Note, the graded lifting W,o relates only the pair of (x,o’) and (y,o’) with 
common strings of control flow. Hence, the derivation of proof tree of this logic 
forces the target program to have the same control flow under the precondition. 


Example 9 (GHL Structure from the product comonad). In the category Set, the 
functor CX £ X x N forms a coproduct-preserving comonad called the product 
comonad. The right adjoint J: Set — Setc of the coKleisli resolution of C yields 
a Freyd category with countable coproducts. We next introduce a (N, <, 0 max)- 
graded lifting C of the comonad C along the fibration pSt: Pred — Set. It is 
defined by Cn(X, P) ê (CX, {(a,m) € X x N | x € P,m > n}). Similarly, we 
give an (N, <, 0 max)-graded Freyd category (J, ®) induced by the graded lifting 
C. In this way we obtain a GHL structure. 

By instantiating GHL with the above GHL structure, we obtain a program 
logic useful for reasoning about security levels. For example, when program P, 
requires security level 3 and P requires security level 7, the sequential compo- 
sition P}; P> requires the higher security level 7 (= max(3,7)). 

We give a simple structure for verifying security levels determined by mem- 
ory access. Fix a function VarLV: dom(Iw) — N assigning security levels to 
variables. For any expression e, we define its required security level SecLV(e) = 
sup{VarLV (x) | « € FV(e)}. Using this, for each expression e of sort s € S we 
introduce a procedure secre € PExp, called secured expression. It returns the 
value of e if the level is high enough, otherwise it returns a meaningless contant: 


[secr,](n,€) = if n > SecLV(e) then [e](€) else a fixed constant cs. 
Secured expressions can be introduced through the following Cp: 
C5 ($, l, p) = {secre | e: s, [secre] € Predo([¢], [Y] (0), SecLV(e) < l}. 


The pomonoid (N, <, 0, max) in the above can also be replaced with a join semi- 
lattice with a least element (Q,<,1,V). Thus, GHL can be instantiated to a 
graded comonadic model of security and its associated reasoning. 
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6 Related Work 


Several works have studied abstract semantics of Hoare Logic. Martin et al. [31] 
give a categorical framework based on traced symmetric monoidal closed cat- 
egories. They also show that their framework can handle extensions such as 
separation logic. However their framework does not directly model effects and it 
cannot accommodate grading as is. Goncharov and Shréder [18] study a Hoare 
Logic to reason in a generic way about programs with side effects. Their logic 
and underlying semantics is based on an order-enriched monad and they show 
a relative completeness result. Similarly, Hasuo [20] studies an abstract weakest 
precondition semantics based on order-enriched monad. A similar categorical 
model has also been used by Jacobs [23] to study the Dijkstra monad and the 
Hoare monad. In the logic by Goncharov and Shréder [18] effects are encapsu- 
lated in monadic types, while the weakest precondition semantics by Hasuo [20] 
and the semantics by Jacobs [23] have no underlying calculus. Moreover, none 
of them is graded. Maillard et al. [29] study a semantics framework based on 
the Dijkstra monad for program verification. Their framework enables reason- 
ing about different side effects and it separates specification from computation. 
Their Dijkstra monad has a flavor of grading but the structure they use is more 
complex than a pomonoid. Maillard et al. [30] focus on relational program log- 
ics for effectful computations. They show how these logics can be derived in a 
relational dependent type theory, but their logics are not graded. 

As we discussed in the introduction, several works have used grading struc- 
tures similar to the one we study in this paper, although often with differ- 
ent names. Katsumata studied monads graded by a pomonoid as a semantic 
model for effects system [24]. A similar approach has also been studied else- 
where [36,42]. Formal categorical properties of graded monads are pursued by 
Fujii et al. [13]. Zhang defines a notion of graded category, but it differs to ours, 
and is instead closer to a definition of a graded monad [55]. As we showed in 
Section 4, graded categories can be constructed both by monads and comonads 
graded by a pomonoid, and it can also capture graded structures that do not arise 
from either of them. Milius et al. [33] also studied monads graded by a pomonoid 
in the context of trace semantics where the grading represents a notion of depth 
corresponding to trace length. Exploring whether there is a generalization of our 
work to traces is an interesting future work. 

Various works study comonads graded with a semiring structure as a semantic 
model of contextual computations captured by means of type systems [7,16,44]. 
In contrast, our graded comonads are graded by a pomonoid. The additive struc- 
ture of the semiring in those works is needed to merge the gradings of different 
instances of the same variable. This is natural for the A-calculus where the con- 
text represent multiple inputs, but there is only one conclusion (output). Here 
instead, we focus on an imperative language. So, we have only one input, the 
starting memory, and one output, the updated memory. Therefore, it is natural 
to have just the multiplicative structure of the semiring as a pomonoid. The 
categorical axiomatics of semiring-graded comonads are studied by Katsumata 
from the double-category theoretic perspective [25]. 
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Apart from graded monads, several generalizations of monads has been pro- 
posed. Atkey introduces parameterized monads and corresponding parameterized 
Freyd categories [1], demonstrating that parameterized monads naturally model 
effectful computations with preconditions and postconditions. Tate defines pro- 
ductors with composability of effectful computations controlled by a relational 
‘effector’ structure [53]. Orchard et al. define category-graded monads, general- 
izing graded and parameterised monads via lax functors and sketch a model of 
Union Bound Logic in this setting (but predicates and graded-predicate inter- 
action are not modelled, as they are here) [41]. Interesting future work is to 
combine these general models of computational effects with Hoare logic. 


7 Conclusion 


We have presented a Graded Hoare Logic as a parameterisable framework for 
reasoning about programs and their side effects, and studied its categorical se- 
mantics. The key guiding idea is that grading can be seen as a refinement of 
effectful computations. This has brought us naturally to graded categories but 
to fully internalize this refinement idea we further introduced the new notion 
of graded Freyd categories. To show the generality of our framework we have 
shown how different examples are naturally captured by it. 
We conclude with some reflections on possible future work. 


Future work Carbonneaux et al. present a quantitative verification approach for 
amortized cost analysis via a Hoare logic augmented with multivariate quantities 
associated to program variables [8]. Judgments + {1"; Q}S{I"; Q’} have pre- and 
post-conditions T and I” and potential functions Q and Q’. Their approach can 
be mapped to GHL with a grading monoid representing how the potential func- 
tions change. However, the multivariate nature of the analysis requires a more 
fine-grained connection between the structure of the memory and the structure 
of grades, which have not been developed yet. We leave this for future work. 

GHL allows us to capture the dependencies between assertions and grading 
that graded program logics usually use. However, some graded systems (e.g. [4]) 
use more explicit dependencies by allowing grade variables—which are also used 
for grading polymorphism. We plan to explore this direction in future work. 

The setting of graded categories in this work subsumes both graded mon- 
ads and graded comonads and allows flexibility in the model. However, most 
of our examples in Section 5.3 are related to graded monads. The literature 
contains various graded comonad models of data-flow properties: like liveness 
analysis [44], sensitivities [7], timing and scheduling [16], and information-flow 
control [40]. Future work is to investigate how these structures could be adopted 
to GHL for reasoning about programs. 
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Abstract. Property-based testing uses randomly generated inputs to 
validate high-level program specifications. It can be shockingly effective 
at finding bugs, but it often requires generating a very large number of 
inputs to do so. In this paper, we apply ideas from combinatorial testing, 
a powerful and widely studied testing methodology, to modify the dis- 
tributions of our random generators so as to find bugs with fewer tests. 
The key concept is combinatorial coverage, which measures the degree 
to which a given set of tests exercises every possible choice of values for 
every small combination of input features. 


In its “classical” form, combinatorial coverage only applies to programs 
whose inputs have a very particular shape—essentially, a Cartesian prod- 
uct of finite sets. We generalize combinatorial coverage to the richer world 
of algebraic data types by formalizing a class of sparse test descriptions 
based on regular tree expressions. This new definition of coverage inspires 
a novel combinatorial thinning algorithm for improving the coverage of 
random test generators, requiring many fewer tests to catch bugs. We 
evaluate this algorithm on two case studies, a typed evaluator for Sys- 
tem F terms and a Haskell compiler, showing significant improvements 
in both. 


Keywords: Combinatorial testing, Combinatorial coverage, QuickCheck, 
Property-based testing, Regular tree expressions, Algebraic data types 


1 Introduction 


Property-based testing, popularized by tools like QuickCheck [7], is a principled 
way of testing software that focuses on functional specifications rather than 
suites of input-output examples. A property is a formula like 


Va. P(x, f(x)), 
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where f is the function under test and P is some executable logical relation- 
ship between an input x and the output f(x). The test harness generates ran- 
dom values for x, hoping to either uncover a counterexample—an x for which 
=P(a, f(x))), indicating a bug—or else provide confidence that f is correct with 
respect to P. 

With a well-designed random test case generator, property-based testing has 
a non-zero probability of generating every valid test case (up to a given size limit); 
property-based testing is thus guaranteed to find any bug that can be provoked 
by an input below the size limit... eventually. Unfortunately, since each input is 
generated independently, random testing may end up repeating the same or sim- 
ilar tests many times before happening across the specific input which provokes 
a bug. This poses a particular problem in settings like continuous integration, 
where feedback is needed quickly—it would be nice to have an automatic way to 
guide the generator to a more interesting and diverse set of inputs, “thinning” 
the distribution to find bugs with fewer tests. 

Combinatorial testing, an elegant approach to testing from the software en- 
gineering literature [2, 16,17], offers an attractive metric for judging which tests 
are most interesting. In its classical presentation, combinatorial testing advocates 
choosing tests to maximize t-way coverage of a program’s input space—i.e., to 
exercise all possible choices of concrete values for every combination of t input 
parameters. For example, suppose a program p takes Boolean parameters w, x, 
y, and z, and suppose we want to test that p behaves well for every choice of 
values for every pair of these four parameters. If we choose carefully, we can 
check all such choices—all 2-way interactions—with just five test cases: 


1. w = False x = False y = False z = False 
2. w = False x = True y = True z = True 
3. w = True x = False y = True z = True 
A, w = True x = True y = False z = True 
5. w = True x = True y = True z = False 


You can check for yourself: for any two parameters, every combination of values 
for these parameters is covered by some test. For example, “w = False and x 
= False” is covered by #1, while both “w = True and x = True” and “w = 
True and y = True” are covered by #5. Any other test case we could come 
up with would check a redundant set of 2-way interactions. Thus, we get 100% 
pairwise coverage with just five out of the 24 = 16 possible inputs. This advantage 
improves exponentially with the number of parameters. 

Why is this interesting? Because surveys of real-world systems have shown 
that bugs are often provoked by specific choices of just a few parameters [16]. 
Indeed, one study involving a distributed database at NASA found that, out 
of 100 known failures, 93 were caused by 2-way parameter interactions; the 
remaining 7 failures were each caused by no more than 6 parameters interacting 
together [14]. This suggests that combinatorial testing is an effective way to 
choose test cases for real systems. 

If combinatorial coverage can be used to concentrate bug-finding power into 
small sets of tests, it is natural to wonder whether it could also be used to thin the 
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distribution of a random generator. So far, combinatorial testing has mostly been 
applied in settings where the input to a program is just a vector of parameters, 
each drawn from a small finite set. Could we take it further? In particular, could 
we transfer ideas from combinatorial testing to the richer setting addressed by 
QuickCheck—i.e., functional programs whose inputs are drawn from structured, 
potentially infinite data types like lists and trees? 

Our first contribution is showing how to generalize the definition of combi- 
natorial coverage to work with regular tree expressions, which themselves gen- 
eralize the algebraic data types found in most functional languages. Instead of 
covering combinations of parameter choices, we measure coverage of test de- 
scriptions—concise representations of sets of tests, encoding potentially inter- 
esting interactions between data constructors. For example, the test description 
cons(true, ofalse) describes the set of Boolean lists that have true as their first 
element, followed by at least one false somewhere in the tail. 

Our second contribution is a method for enhancing property-based testing 
using combinatorial coverage. We propose an algorithm that uses combinato- 
rial coverage information to thin an existing random generator, leading it to 
more interesting test suites that find bugs more often. A concrete realization 
of this algorithm in a tool called QuickCover was able, in our experiments, to 
guide random generation to find bugs using an average of 10x fewer tests than 
QuickCheck. While generating test suites is (considerably) slower, running the 
tests can be much faster. As such, QuickCover excels in settings where tests are 
particularly costly to run, as well as in situations like continuous-integration, 
when the cost of test generation is amortized over many runs of the test suite. 

In summary, we offer these contributions: 


— We generalize the notion of combinatorial coverage to work over a set of 
test descriptions and show how this new definition generalizes to algebraic 
data types with the help of regular tree expressions (Section 3). Section 4 
describes the technical details behind the specific way we choose to represent 
these descriptions. 

— We propose a process for guiding the test distribution of an existing random 
generator based on our generalized notion of combinatorial coverage (Section 
5). 

— Finally, we demonstrate, with two case studies, that QuickCover can find 
bugs using significantly fewer tests (Section 6) than pure random testing. 


We conclude with an overview of related work (Section 7), and ideas for future 
work (Section 8). 


2 Classical Combinatorial Testing 


To set the stage, we begin with a brief review of “classical” combinatorial testing. 

Combinatorial testing measures the “combinatorial coverage” of test suites, 
aiming to find more bugs with fewer tests. Standard presentations [16] are 
phrased in terms of a number of separate input parameters. Here, for notational 
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consistency with the rest of the paper, we will instead assume that a program 
takes a single input consisting of a tuple of values. 

Assume we are given some finite set C of constructors, and consider the set 
of n-tuples over C: 


{tuple (Ci, ..., Cn) | Ci, ..., Cn EC} 


(The “constructor” tuple; is not strictly needed in this section, but it makes 
the generalization to constructor trees and tree regular expressions in Section 3 
smoother.) We can use these tuples to represent test inputs to systems. For 
example a web application might be tested under configurations 


tuple, (Safari, MySQL, Admin, English) 


in order to verify some end-to-end property of the system. 
A specification of a set of tuples is written informally using notation like: 


tuple, (SafaritChrome, Postgres+MySQL, Admin+User, French+English) 


This specification restricts the set of valid tests to those that have valid browsers 
in the first position, valid databases in the second, and so on. Specifications are 
thus a lot like types—they pick out a set of valid tests from some larger set. We 
define this notation precisely in Section 3. 

To define combinatorial coverage, we introduce the notion of partial tuples— 
i.e., tuples where some elements are left indeterminate (written T). For example: 


tuple,(Chrome, T, Admin, T). 


A description is compatible with a specification if its concrete (non-T) construc- 
tors are valid in the positions where they appear. Thus, the description above is 
compatible with our web-app configuration specification, while this one is not: 


tuple,(MySQL, MySQL, French, T) 


We say a test covers a description—which, conversely, describes the test— 
when the tuple matches the description in every position that does not contain 
T. For example, the description 


tuple,(Chrome, T, Admin, T) 
describes these tests: 
tuple,(Chrome, MySQL, Admin, English) 
tuple,(Chrome, MySQL, Admin, French) 
tuple,(Chrome, Postgres, Admin, English) 
tuple,(Chrome, Postgres, Admin, French) 


Finally, we call a description t-way if it fixes exactly t constructors, leaving the 
rest as T. 
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Now, suppose a system under test takes a tuple of configuration values as 
input. Given some correctness property (e.g., the system does not crash), a test 
for the system is simply a particular tuple, while a test suite is a set of tuples. 
We can then define combinatorial coverage as follows: 


Definition 1. The t-way combinatorial coverage of a test suite is the proportion 
of t-way descriptions, compatible with a given specification, that are covered by 
some test in the sutte. 


We say that t is the strength of the coverage. 


A test suite with 100% 2-way coverage for the present example can be quite 
small. For example, 


tuple,(Chrome, Postgres, Admin, English) 
tuple,(Chrome, MySQL, User, French) 
tuple, (Safari, Postgres, User, French) 
tuple, (Safari, MySQL, Admin, French) 
tuple, (Safari, MySQL, User, English) 


achieves 100% coverage with just five tests. The fact that a single test covers 
many different descriptions is what makes combinatorial testing work: while the 
number of descriptions that must be covered is combinatorially large, a single 
test can cover combinatorially many descriptions. In general, for a tuple of size n, 
the number of descriptions is given by (7) ways to choose t parameters multiplied 
by the number of distinct values each parameter can take on. 


3 Generalizing Coverage 


Of course, inputs to programs are often more complex than just tuples of enu- 
merated values, especially in the world of functional programming. To apply the 
ideas of combinatorial coverage in this richer world, we generalize tuples to con- 
structor trees and tuple specifications to regular tree expressions. We can then 
give a generalized definition of test descriptions that makes sense for algebraic 
data types, setting up for a more powerful definition of combinatorial coverage. 

A ranked alphabet X is a finite set of atomic data constructors, each with a 
specified arity. For example, the ranked alphabet 


Dist(bool) £ {(cons, 2), (nil, 0), (true, 0), (false, 0)} 


defines the constructors needed to represent lists of Booleans. Given a ranked 
alphabet X, the set of trees over X is the least set 7s that satisfies the equation 


Ts = {C(t, TEE tn) | (C, n) ELAH, ETEN tn E Ts}. 
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Regular tree expressions are a compact and powerful tool for specifying sets 
of trees [9,10]. They are generated by the following syntax: 


| C(e1, ..., en) for (C, n) E€ X 


Each of these operations has an analog in standard regular expressions over 
strings: + corresponds to disjunction of regular expressions, u corresponds to 
iteration, and the parent-child relationship corresponds to concatenation. These 
expressions give us a rich language for describing tree structures. 

The denotation function |] mapping regular tree expressions to sets of trees 
is the least function satisfying the equations: 


[T] = Ts 
[C(e1, ..-,; en)] = {0(ti; ..-, tn) | ti € le:l} 
[er + e2] = [ea] U [ee] 
[X. e] = [eļjuX. e/X]] 


Regular tree expressions subsume standard first-order algebraic data type 
definitions. For example, the Haskell definition 


data BoolList = Cons Bool BoolList | Nil 
is equivalent to the regular tree expression 
uX. cons(true + false, X) + nil. 


Crucially for our purposes, regular tree expressions can also be used to define sets 
of trees that cannot be described with plain ADTs. For example, the expression 


cons(true + false, nil) 
denotes all single-element Boolean lists, while 
uX. cons(true, X) + nil 


describes the set of lists that only contain true. Regular tree expressions can even 
express constraints like “true appears at some point in the list”: 


uX. cons(T, X) + cons(true, pY. cons(T, Y) + nil) 


This machinery smoothly generalizes the structures we saw in Section 3. 
Tuples are just a special form of trees, while specifications and test descriptions 
can be written as regular tree expressions. This gives us most of what we need 
to define algebraic data types. 
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Recall the definition of t-way combinatorial coverage: “the proportion of (1) 
t-way descriptions, (2) compatible with a given specification, that (3) are covered 
by some test in the suite.” What does this mean in the context of regular tree 
expressions and trees? 

Condition (3) is easy: a test (i.e., a tree) t covers a test description (a regular 
tree expression) d if t € [d]. 

For (2), consider some regular tree expression T representing an algebraic 
data type that we would like to cover. We say that a description d is compatible 
with 7 if [7] A [d] # Ø. As with string regular expressions, this can be checked 
efficiently. 

The only remaining question is (1): which set of t-way descriptions to use. 
We argue in the next section that the set of all regular tree expressions is too 
broad, and we offer a simple and natural alternative. 


4 Sparse Test Descriptions 


A naive way to generalize the definition of t-way descriptions to regular tree 
expressions would be to first define the size of a regular tree expression as the 
number of operators (constructors, +, or 4) in it and then define a t-way de- 
scription to be any regular tree expression of size t. However, this approach does 
not specialize nicely to the classical case; for example the description 


tuple4 (Safari + Chrome, T, T, T) 


would be counted as “4-way” (3 constructors and 1 “+” operator), even though it 
is covered by every well-formed test. Worse, “interesting” descriptions are often 
quite large. For example, the smallest possible description of lists in which true 
is followed by false, 


uX. cons(T, X)+cons(true, pY. cons(T, Y)+cons(false, uZ. cons(T, Z) + nil)) 


has size t = 14. We want a representation that packs as much information as 
possible into small descriptions, making t-way coverage meaningful for small 
values of t and increasing the complexity of the interactions captured by our 
definition of coverage. 

In sum, we want a definition of coverage that straightforwardly specializes 
to the tuples-of-constructors case and that captures interesting structure with 
small descriptions. 

Our proposed solution, described next, takes inspiration from temporal logic. 
We first encode an “eventually” (©) operator that allows us to write the expres- 
sion from above much more compactly as ocons(true, false). This can be read 
as “somewhere in the tree, there is a cons node with a true node to its left and a 
false node somewhere in the tree to its right.” Then we define a restricted form 
of sparse test descriptions using just ©, T, and constructors. 
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4.1 Encoding “Eventually” 


The “eventually” operator can actually be encoded using the regular tree ex- 
pression operators we have already defined—i.e., we can add it without adding 
any formal power. First, define the set of templates for the ranked alphabet X: 


TÊ{0(Ti, -, Tii, |e aes Vay Ce n)E X, 1<i<n} 


That is, for each constructor C in X, the set of templates T contains 

C({], T, .--, TECT [], T, ..., T), ete., all the way to C(T, ..., T, []), 
enumerating every way to place one hole in the constructor and fill every other 
argument slot with T. (Nullary constructors are ignored.) Then we define “next” 
(oe) and “eventually” (oe) as 


oe £ bD T[e] 


TET 
oe Ê uX. e+0X 


where Tle] is the replacement of [] in T with e. Intuitively, oe describes any 
tree C(t,, ..., tn) in which e describes some direct child (i.e., tı, t2, and so 
on), while oe describes anything described by e, plus (unrolling the u) anything 
described by oe, ooe, and so on. 

This is not the only way to design a compact, expressive subset of regular 
tree expressions, but our evaluation shows that this has useful properties. In 
addition, the o notation gives an elegant way to write descriptions like the one 
from the previous section (ocons(true, ofalse),), neatly capturing “somewhere in 
the tree” constraints that would require many more symbols in the bare language 
of regular tree expressions. 


4.2 Defining Coverage 


Even in the language with just o, T, and constructors, there is still a fair amount 
of freedom in how we define the set of t-way descriptions. In this section we 
present one possibility that we have found to be useful in practice; in Section 8 
we discuss another interesting option. 

The set of sparse test descriptions for a given » is the trees generated by 


dêT 
| oC(di, ..., dn) for (C, n) € X, 


that is, trees consisting of constructors prefixed by o and T. We call these descrip- 
tions “sparse” because they match specific ancestor-descendant arrangements of 


3 This construction is why we choose to deal with finite ranked alphabets: if X were 
infinite, T would be infinite, and oe would be an infinite term that is not expressible 
as a standard regular tree expression. 
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constructors but place no restriction on the constructors in between, due to the 
“eventually” before each constructor. 

Sparse test descriptions are designed to be compact, useful in practice, and 
compatible with the classical definition of coverage. For that reason we aim 
to keep them as information-dense as possible. First, we do not include the u 
operator directly, instead relying on ©: indeed, © captures a pattern of recursion 
that is general enough to express interesting non-local constraints while keeping 
description complexity low. Similarly, we do not need to include the + operator: 
any test that covers any test that covers either C(d,, ..., dn) or D(di, ..., dm) 
will also necessarily cover C(di, ..., dn) + D(di, ..., dm). 

Removing explicit uses of u and + does limit the expressive power of sparse 
test descriptions a little—for example it rules out complex mutually recursive 
definitions. However, we do not intend to use descriptions to specify entire lan- 
guages, only fragments of languages that we hope to cover with testing. Natu- 
rally, there are many other possible formats for test descriptions that would be 
interesting to explore—we leave that for future work. In this paper, we chose to 
make descriptions very compact while preserving most of their expressive power, 
and the case studies in Section 6 demonstrate that such a choice works well in 
at least two challenging domains that are relevant to programming languages as 
a whole. 

Finally, we define the size of a description based on the number of construc- 
tors it contains. Intuitively, a t-way description is one with t constructors; how- 
ever, in order to be consistent with the classical definition, we omit constructors 
whose types permit no alternatives. For example, all of the tuple constructors 
(e.g. tuple, in our running example) are left out of the size calculation. This 
makes t-way sparse test description coverage specialize to exactly classical t- 
way parameter interaction coverage for the case of tuples of sums of nullary 
constructors. 

Sparse descriptions work as expected for types like 


tuple, (Safarit; Chrome, Postgres+MySQL, Admin+User, French+English). 
Despite some stray occurrences of ©, as in 
Stuple,(OChrome, MySQL, T, T), 


the descriptions still describe the same sets of tests as the standard tuple descrip- 
tions without the uses of o. Thus, our new definition of combinatorial coverage 
generalizes the classical one. 

These descriptions capture a rich set of test constraints in a compact form. 
The real proof of this is in our evaluation results—see Section 6 for those—but 
a few more examples may help illustrate. 


Boolean Lists As a first example, consider the type of Boolean lists: 


Tiist(bool) = HX. cons(true + false, X) + nil. 
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The set of all 2-way descriptions that are compatible with Tist(bool) 18: 


ocons(otrue, T) ocons(ofalse, T)  ocons(T, onil) 
ocons(T, ocons(T, T)) cons(T, otrue) ocons(T, ofalse) 


Unpacking the notation, ocons(otrue, T) describes the set of trees where “at 
some point in the tree there is a cons node with a true node somewhere in its 
left child.” 


Arithmetic Expressions Consider the type of simple arithmetic expressions over 
the constants 0, 1, and 2: 


Tepr & UX. add(X, X) + mul(X, X)+0+4+142. 
This type has 2-way descriptions like 

Sadd(mul(T, T), T) and Omul(T, add(T, T)), 
which capture different nestings of addition and multiplication. 


System F For a more involved example, let’s look at some 2-way sparse descrip- 
tions for a much more complex data structure: terms of the polymorphic lambda 
calculus, System F. 


TÊU| T >T |n |Y.r 


e£ () |n| Ar. e| (e1 e2) | Ae | (e 7) 


(We use de Bruijn indices for variable binding, meaning that each variable oc- 
currence in the syntax tree is represented by a natural number indicating which 
enclosing abstraction it was bound by.) 

System F syntax can be represented using a regular tree expression like 


uX. unit +var(VAR) +abs(TYPE, X)+app(X, X)+tabs(X)+tapp(X, TYPE), 


where TYPE is defined in a similar way and VAR represents natural-number 
de Bruijn indices. 
This already admits useful 2-way descriptions like 


Sapp(Oabs(T, T), T) and Oapp(app(T, T), T), 


which capture relationships between lambda abstractions and applications. In 
Section 6.1, we use descriptions like these to find bugs in an evaluator for System 
F expressions; they ensure that our test suite adequately covers different nestings 
of abstractions and applications that might provoke bugs. 

With a little domain-specific knowledge, we can make the descriptions cap- 
ture even more. When setting up our case study in Section 6.2, which searches 
for bugs in GHC’s strictness analyzer, we found that it was often useful to track 
coverage of the seq function, which takes two functions as arguments, executes 
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the first for any side-effects (e.g., exceptions), and then executes the second. 
Modifying our regular expression type to include seq as a first-class constructor 
results in 2-way descriptions now include interactions like 


Oseq(Oapp(T, T), T) 


that encode interactions of seq with other System F constructors. These in- 
teractions are crucial for finding bugs in a strictness analyzer, since seq gives 
fine-grained control over the evaluation order within a Haskell expression. 


5 Thinning Generators with QuickCover 


Having generalized the definition of combinatorial coverage to structured data 
types, the next step is to explore ways of using coverage to improve property- 
based testing. 

When we first approached this problem, we planned to follow the conventional 
combinatorial testing methodology of generating covering arrays [38], i.e., test 
suites with 100% t-way coverage for a given t. Rather than use an unbounded 
stream of random tests—the standard methodology in property-based testing— 
we would test properties using just the tests in some pre-generated covering 
array. However, we encountered two major problems with this approach. First, 
as t grows, covering arrays become frighteningly expensive to generate. While 
there are efficient methods for generating covering arrays in special cases like 
2-way coverage [8], general algorithms for generating compact covering arrays 
are complex and often slow [23]. Second, we found that covering arrays for sets 
of test descriptors in the format described above did not do particularly well 
at finding bugs! In a series of preliminary experiments with one of our case 
studies, we found that with 4-way coverage (the highest we could generate in 
reasonable time), our covering arrays did not reliably catch all of the bugs in our 
test system. Fortunately, after some more head scratching and experimenting, we 
discovered an alternate approach that works quite well. The trick is to embrace 
the randomness that makes property-based testing so effective. 

In the remainder of this section, we first present an algorithm that uses com- 
binatorial coverage to “thin” a random generator, guiding it to more interesting 
inputs. Rather than generating a fixed set of tests in the style of covering arrays, 
this approach produces an unbounded stream of interesting test inputs. Then we 
discuss some concrete details behind QuickCover, the Haskell implementation of 
our algorithm that we used to obtain the experimental results in Section 6. 


5.1 Online Generator Thinning 


The core of our algorithm is QuickCheck’s standard generate-and-test loop. 
Given a test generator gen and a property p, QuickCheck generates inputs re- 
peatedly until either (1) the property fails, or (2) a time limit is reached. 
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QuickCheck(gen, p): 
repeat LIMIT times: 
# Generate 1 new input 
x = gen() 
# Check the property 
if !p(x), return False 
return True 


LIMIT is chosen based on the user’s specific testing budget, and it can vary 
significantly in practice. In the experiments below, we know a priori that a bug 
exists in the program, so we set LIMIT to infinity and just run tests until the 
property fails. 


Our algorithm modifies this basic one to use combinatorial coverage infor- 
mation when choosing the next test to run. 


QuickCover(strength, fanout, gen, p): 
coverage = initCoverage () 
repeat LIMIT times: 
# Generate fanout potential inputs 
xs = listOf(gen(), fanout) 
# Find the input with the best improved coverage 
x = argmax[x in xs] ( 
coverageImprovement(x, coverage, strength) ) 
# Check the property 
if !p(x), return False 
# Update the coverage information 
coverage = updateCoverage(x, coverage, strength) 
return True 


The key idea is that, instead of generating a single input at each iteration, we 
generate several (controlled by the parameter fanout) and select the one that 
increases combinatorial coverage the most. We test the property on that input 
and, if it does not fail, update the coverage information based on the test we ran 
and keep going. 


This algorithm is generic with respect to the representation for coverage in- 
formation, but the particular choice of data structure and interpretation makes a 
significant difference in both efficiency and effectiveness. In our implementation, 
coverage information is represented by a multi-set of descriptions: 
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initCoverage(): 
return emptyMultiset () 


coverageImprovement(x, coverage, strength): 
ds = descriptions(x, strength) 
return sum([ 1 / (count(d, coverage) + 1) 
for d in ds ]) 


updateCoverage(x, coverage, strength): 
return union(descriptions(x, strength), coverage) 


At the beginning, the multi-set is empty; as testing progresses, each test 
is evaluated based on coverageImprovement. If a description d had previously 
been covered n times, it contributes Ea to the score. For example, if a test 
input covers dı and dz, where previously dı was not covered and dz was covered 
3 times, the total score for the test input would be 1 + 0.25 = 1.25. 

At first glance, one might think of a simpler approach based on sets instead of 
multi-sets. Indeed, this was the first thing we tried, but it turned out to perform 
substantially worse than the multiset-based one in our experiments. The reason 
is that just covering each description once turns out not to be sufficient to find all 
bugs, and, once most descriptions have been covered, this approach essentially 
degenerates to normal random testing. By contrast, the multi-set representation 
continues to be useful over time; after each description has been covered once, 
the algorithm begins to favor inputs that cover descriptions a second time, then 
a third time, and so on. This allows QuickCover to generate arbitrarily large test 
suites that continue to benefit from combinatorial coverage. 

Keeping track of coverage information like this does create some overhead.4 
For each test that QuickCover considers (including those that are never run), it 
needs to analyze which descriptions the test covers and check those against the 
current multi-set. This overhead means that QuickCover is often much slower 
than QuickCheck with respect to to generating tests. In the next section, we 
explore use cases for QuickCover that overcome this overhead by running fewer 
tests. 


6 Evaluation 


Since QuickCover adds some overhead to generating tests, one might expect 
that it will be particularly well suited to situations where each test may be run 
many times. The primary goal of our experimental evaluation was to test this 
hypothesis. 


4 The overhead introduced is highly variable and based largely on the exact implemen- 
tation of the underlying test generator. Appendix A goes into slightly more detail 
on the asymptotics, but broadly speaking the time it QuickCover to generate a test 
is linear in the fan-out and exponential in the coverage strength. 
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Of course, running the same test repeatedly on the same code is pointless: if 
it were ever going to fail, it would do so on the first run (ignoring the thorny 
possibility of “flaky tests” due to nondeterminism [25]). However, running the 
same test on successive versions of the code is not only useful; it is standard 
practice in two common settings: regression testing, i.e., checking that code is still 
working after changes, and especially continuous integration, where regression 
tests are run automatically every time a developer checks in a new version of the 
code. In these settings, the overhead introduced by generating many tests and 
discarding some without running them can be amortized, since the same tests 
may be reused very many times, so that the cost of generating the test suite 
becomes less important than the cost of running it. 

In order to validate this theory, we designed two experiments using Quick- 
Cover. The primary goal of these experiments was to answer the question: Does 
QuickCover actually reduce the number of tests needed to find bugs in a real 
system? 

Both case studies answer this question in the affirmative. The first case study, 
in particular, demonstrates a situation where QuickCover needs an average 10x 
fewer tests to find bugs, compared to pure random testing. We choose an evalu- 
ator for System F terms as our example because it allows us to test how Quick- 
Cover behaves in a small but realistic scenario that requires a fairly complex 
random testing setup. Our second case study expands on results from Pałka et 
al. [32], scaling up and applying QuickCover to find bugs in the Glasgow Haskell 
Compiler (GHC) [27]. 

A secondary goal of our evaluation was to understand whether the generator 
thinning overhead is always too high to make QuickCover useful for real-time 
property-based testing, or if there are any cases where using QuickCover would 
yield a wall-clock improvement even if tests are only run once. Our second case 
study answers this question in the affirmative. 


6.1 Case Study: Normalization Bugs in System F 


Our first case study uses combinatorial coverage to thin a highly tuned and 
optimized test generator for System F [12,35] terms. The generator produces 
well-typed System F terms by construction (no mean feat on its own) and is 
tuned to produce a highly varied distribution of different terms. Despite all the 
care put into the base generator, we found that modifying the test distribution 
using QuickCover results in a test suite that finds bugs with many fewer inputs. 

Generating “interesting” programs (for finding compiler bugs, for example) 
is an active research area. For instance, a generator for well-typed simply typed 
lambda-terms has been used to reveal bugs in GHC [6, 20,32], while a generator 
for C programs that avoid “undefined behaviors” has been used to find many 
bugs in production compilers [24, 34,41] The cited studies are all examples of 
differential testing, where different compilers (or different versions of the same 
compiler) were run against each other on the same inputs to reveal discrepancies. 
Similarly, for the present case study we tested different evaluation strategies 
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for System F, comparing the behavior of various buggy versions to a reference 
implementation. 

Recall the definition of System F from Section 4.2. Let e[v/n] stand for 
substituting v for variable n in e, and e În for “lifting” —incrementing the indices 
of all variables above n in e. Then, for example, the standard rule for substituting 
a type 7 for variable n inside a type abstraction A.e requires lifting 7 and 
incrementing the de Bruijn index of the variable being substituted by one: 


(A. e)[7/n] = A.e[r to /n +1] 


Here are two ways to get this wrong: forget to lift the variables, or forget to 
increment the index. Those bugs would lead to the following erroneous definitions 
(the missing operation is shown in red): 


(A.e)[7/n] = A.e[t to /n +1) and (A.e)[7/n] = A.e[r to /n + 1. 


Inspired by errors like these (specifically in the substitution and variable lifting 
functions), we inserted bugs by hand to create 19 “mutated” versions of two 
different evaluation relations. (The bugs are described in detail in Appendix C.) 
The two evaluation relations simplify terms in slightly different ways: the first 
implements standard big-step evaluation (eval), and the second uses a parallel 
evaluation relation to fully normalize terms (peval). (We chose to check both 
evaluation orders, since some mutations only cause a bug in one implementation 
or the other.) Since we were interested in bugs in either evaluation order, we 
tested a joint property: 


eval e == eval_mutated e && peval e == peval_mutated e 


Starting with a highly tuned generator for System F terms as our baseline, we 
used both QuickCheck and QuickCover to generate a stream of test values for 
e and measured the average number of tests required to find a bug (i.e., Mean- 
Tests-To-Failure, or MTTF) for each approach. 

Surprisingly, we found little or no difference in MTTF between 2-way, 3-way, 
and 4-way testing, but changing the fan-out did make a large impact. Figure 1 
shows both absolute MTTF for various choices of fan-out (log; scale) and the 
performance improvement as a ratio of un-thinned MTTF to thinned MTTF. All 
choices of fan-out produced better MTTF results than the baseline, but higher 
values of fan-out tended to be more effective on average. In our best experiment, 
a fan-out of 30 found a bug in an average of 15x fewer tests than the baseline; 
the overall average was about 10x better. Figure 2 shows the total MTTF im- 
provement across 19 bugs, compared to the maximum theoretical improvement. 
If our algorithm were able to perfectly pick the best test input every time, the 
improvement would be proportional to the fan-out (i.e., it is impossible for our 
algorithm be more than 10x better with a fan-out of 10). On the other hand, 
if combinatorial coverage were irrelevant to test failure, then we would expect 
the QuickCover test suites to have the same MTTF as QuickCheck. It is clear 
from the figure that QuickCover is really quite effective in this setting: for small 
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Fig. 1. Top: System F MTTF, log;, scale, plotted in order of MTTF for un-thinned 
random tests, t = 2. Bottom: System F MTTF ratio of MTTEF for un-thinned random 
tests to MTTF for QuickCover, t = 2. 


fan-outs, it is very close to the theoretical optimum, and with a fan-out of 30 it 
achieves about i of the potential improvement—that is, three QuickCover test 
cases are more likely to provoke a bug than thirty QuickCheck ones. 


6.2 Case Study: Strictness Analysis Bugs in GHC 


To evaluate how our approach scales, and to investigate whether QuickCover can 
be used not only to reduce the number of tests required but also to speed up bug- 
finding, we replicated the case study of Patka et al. [32], which found bugs in the 
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Fig. 2. System F, proportional reduction in total number of tests needed to find all 
bugs. 


strictness analyzer of GHC 6.12 using a hand-crafted generator for well-typed 
lambda terms; we replicated their experimental setup, but used QuickCover to 
thin their generator and produce better tests. 

Two attributes of this case study make it an excellent test of the capabilities 
of our combinatorial thinning approach. First, it found bugs in a real compiler 
by generating random well-typed lambda terms, and therefore we can evaluate 
whether the reduction in number of tests observed in the System F case study 
scales to a production setting. Second, running a test involves invoking the GHC 
compiler, a heavyweight external process. As a result, reducing the number of 
tests required to provoke a failure should (and does) lead to an observable im- 
provement in terms of wall-clock performance. 

Concretely, Pałka et al. generate a list of functions that manipulate lists of 
integers and compare the behavior of these functions on partial lists (lists with 
undefined elements or tails) when compiled with and without optimizations, 
another example of differential testing. They uncover errors in the strictness 
analyzer component of GHC’s optimizer that lead to inconsistencies where the 
un-optimized version of the compiled code correctly fails with an error while the 
optimized version prints something to the screen before failing: 

Input |-00 Output| —O2 Output 
[undefined] Exception [Exception] 
{[1,undefined] | Exception | [1,Exception] 
[1,2,undefined]] Exception |[1,2,Exception] 


Finally, to balance the costly compiler invocation with the similarly costly 
smart generation process, Patka et al. group 1000 generated functions together 
in a single module to be compiled; this number was chosen to strike a precise 
50-50 balance between generation time and compilation/execution time for each 
generated module. Since our thinning approach itself introduces approximately 
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a 25% overhead in generation time, we increased the number of tests per module 
to 1250 to maintain the same balance and make a fair comparison. 

We ran our experiments in a Virtual Box running Ubuntu 12.04 (a version old 
enough to allow for executing GHC 6.12.1), with 4GB RAM in a host machine 
running i7-8700 @ 3.2GHz. We performed 100 runs of the original case study 
and 100 runs of our variant that adds combinatorial thinning, using a fan-out 
of 2 and a strength of 2. We found that our approach reduces the mean number 
of tests required from 21268 + 1349 to 14895 + 1056, a 42% improvement, and 
reduces the mean time to failure from 193 + 13 seconds to 149 + 12, a 30% 
improvement. 


7 Related Work 


A detailed survey of the (vast) combinatorial testing literature can be found 
in [30]. Here we discuss just the most closely related work, in particular, other 
attempts to generalize combinatorial testing to structured and infinite domains. 
We also discuss other approaches to property based testing with similar goals to 
to ours, such as adaptive random testing and coverage-guided fuzzing. 


7.1 Generalizations of Combinatorial Testing 


Salecker and Glesner [37] extend combinatorial testing to sets of terms generated 
by a context-free grammar. Their approach cleverly maps context-free grammar 
derivations up to some depth k to sets of parameter choices; then it uses standard 
full-coverage test suite generation algorithms to pick a subset of derivations to 
test. The main limitation of this approach is the parameter k. By limiting the 
derivation depth, this approach only defines coverage over a finite subset of the 
input type. By contrast, our definition of coverage works over infinite types by 
exploiting the recursive nature of the © operator. We focus on description size 
rather than term size, which provides more flexibility for “packing” multiple 
descriptions into a single test. 

Another approach to combinatorial testing of context-free inputs is due to 
Lammel and Schulte [19]. Their system also uses a depth bound, but it provides 
the user finer-grained control. At each node in the grammar, the user is free to 
limit the coverage requirements and prune unnecessary tests. This is an elegant 
solution for situations where the desired interactions are known a priori. Unfor- 
tunately, this approach needs to be re-tuned manually to every specific type and 
use-case, so it is not the general solution we were after. 

Finally, Kuhn et al. [15] present a notion of sequence covering arrays to de- 
scribe combinatorial coverage of sequences of events. We believe that t-way se- 
quence covering arrays in their system are equivalent to (2t—1)-way full-coverage 
test suites of the appropriate list type in ours. They also have a reasonably effi- 
cient algorithm for generating covering arrays in this specialized case. 

Our idea to use regular tree expressions for coverage is partly inspired by 
Usaola et al. [40] and Mariani et al. [26]. Rather than generate a set of terms to 
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cover an ADT, these works generate strings to cover (i.e. match in every possible 
way) a particular regular expression. This turns out to be quite a different prob- 
lem, but these explorations led us to consider coverage in context of of formal 
languages. 


7.2 Comparison with Enumerative Property-Based Testing 


Another approach to property-based testing research is based on enumeration of 
small test cases, rather than random generation. Tools like SmallCheck [36] offer 
guarantees that there is no counterexample smaller than a certain limit, and 
moreover always report the smallest counterexample when it exists. To compare 
our approach with this type of tool, we repeated our System F evaluation with 
a variety of enumerative testing tools. 

We first tried SmallCheck, which enumerates all test cases up to a given 
depth. Unfortunately, the number of System F terms rises very rapidly with the 
depth: SmallCheck quickly enumerated 708 terms of depth up to three, but could 
not enumerate all terms of depth four within 20 minutes of CPU time.’ Only 
one of the 19 bugs we planted was provoked by any of those 708 terms. 

However, SmallCheck wastes effort generating syntactically correct terms 
that are not type correct; only 140 of the 708 were well-typed. Lazy Small- 
Check [36] exploits laziness in property preconditions to discard many test cases 
in a group—in this case, all those terms that fail a type-check in the same way 
are discarded together. Because well-typedness is such a strong precondition, 
Lazy SmallCheck is able to dramatically reduce the number of terms needed at 
each depth, enabling us to increase the depth limit to 4, and generate over five 
million terms. The result was a much more comprehensive test suite than normal 
SmallCheck, but it still only found 8 out of our 19 bugs. 

The problem here is that the smallest counterexamples we are searching for 
are quite small terms, but may nevertheless have a few fairly deep nodes in their 
syntax trees. More recent enumerative tools, such as LeanCheck [3], enumerate 
test cases in size order, instead of in depth order, thus reaching terms with just a 
few deeper nodes much earlier in the enumeration. For this example, LeanCheck 
runs out of memory after about 11 million tests. but this was enough to find all 
but four of the planted bugs. 

However, LeanCheck does not use the Lazy SmallCheck optimization, and 
so is mostly testing ill-typed terms, for which our property holds vacuously. 
SciFe [18] enumerates in size order and uses the Lazy SmallCheck optimization, 
with good results. It is hard to apply SciFe, which is designed to test Scala, to our 
Haskell code, so instead we created a Lazy SmallCheck variant that enumerates 
in size order. With this variant, we could find all of the planted bugs, with 
counterexample sizes varying from 5 to 14. Lazy SmallCheck does not report the 
number of tests needed to find a counterexample, just the size at which it was 
found, together with the number of test cases of each size. We can therefore only 


5 Compiled with ghc -O2, on an Intel i7-6700k with 32GB of RAM under Windows 
10. 
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give a lower bound for the number of tests needed to find each bug. Figure 3 plots 
this lower bound against the average number of tests needed by QuickCheck and 
by QuickCover. For these bugs, it is clear that the enumerative approach is not 
competitive with QuickCheck, let alone with QuickCover. The improvement in 
the numbers of tests needed ranges from 1.7 to 5.5 orders of magnitude, with a 
mean across all the bugs of 3.3 orders of magnitude. 
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Fig. 3. System F MTTF for QuickCheck and QuickCover, and the lower bound on the 
number of tests run by our Lazy SmallCheck variant, log, scale. 


7.3 Comparison with Fuzzing Techniques 


Coverage-guided fuzzing tools like AFL [22] can be viewed as a way of using a 
different form of feedback (branch instead of combinatorial coverage) to improve 
the generation of random inputs by finding more “interesting” tests. Fuzzing 
is a huge topic [43] that has exploded in popularity recently, with researchers 
evaluating the benefits of using more forms of feedback [13,31], incorporating 
learning [28,33] or symbolic [39,42] techniques, and bringing the benefits of these 
methods to functional programming [11,21]. One fundamental difference, how- 
ever, is that all of these techniques are online and grey-box: they instrument and 
execute the program on various inputs in order to obtain feedback. In contrast, 
combinatorial coverage can be computed without any knowledge of the code it- 
self, therefore providing a convenient black-box alternative that can be valuable 
when the same test suite is to be used for many versions of the code (such as 
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in regression testing) or when executing the code is costly (such as when testing 
production compilers). 

Chen et al.’s adaptive random testing (ART) [4] uses an algorithm that, like 
QuickCover’s, generates a set of random tests and selects the most interesting to 
run. Rather than using combinatorial coverage, ART requires a distance metric 
on test cases—at each step, the candidate which is farthest from the already-run 
tests is selected. Chen et al. show that this approach finds bugs after fewer tests, 
on average, in the programs they study. ART was first proposed for programs 
with numerical inputs, but Ciupa et al. [5] showed how to define a suitable metric 
on objects in an object-oriented language and used it to obtain a reduction of 
up to two orders of magnitude in the number of tests needed to find a bug. Like 
combinatorial testing, ART is a black-box approach that depends only on the 
test cases themselves, not on the code under test. 

However, Arcuri and Briand [1] question ART’s value in practice, because 
of the quadratic number of distance computations it requires, from each new 
test to every previously executed test; in a large empirical study, they found 
that the cost of these computations made ART uncompetitive with ordinary 
random testing. While our approach also has significant computational overhead, 
the time and space complexity grow with the number of possible descriptions 
(derived from the data type definition and the choice of strength) and not with 
the total number of tests run—i.e., testing will not slow down over time. In 
addition, our approach works in situations where a distance metric between 
inputs does not make sense. 


8 Conclusion and Future Work 


We have presented a generalized definition of combinatorial coverage and an 
effective way to use that definition for property-based testing, generalizing the 
definition of combinatorial coverage to work in the realm of algebraic data types 
with the help of regular tree expressions. Our sparse test descriptions provide 
a robust way to look at combinatorial testing, which specializes to the classical 
approach. We use these sparse descriptions as a basis for QuickCover—a tool that 
thins a random generator to increase combinatorial coverage. Two case studies 
show that QuickCover is useful in practice, finding bugs using an average of 10x 
fewer tests. 

The rest of this section sketches a number of potential directions for further 
research. 


8.1 Variations 


Our experiments show that sparse test descriptions are a good way to define 
combinatorial coverage for algebraic data types, but they are certainly not the 
the only way. Here we discuss some variations and why they might be interesting 
to explore. 
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Representative Samples of Large Types Perhaps it is possible to do combinatorial 
testing with ADTs by having humans decide exactly which trees to cover. This 
approach is already widely used in combinatorial testing to deal with types like 
machine integers that, though technically finite, are much too large for testing to 
efficiently cover all their “constructors.” For example, if a human tester knows 
(by reading the code, or because they wrote it) that it contains an if-statement 
guarded by x < 5, they might choose to cover 


x € {—2147483648, 0, 4, 5, 6, 2147483647}. 


The tester might choose values around 5 because those are important to the 
specific use case and boundary values and 0 to check for common edge-cases. 
Concretely, this practice means that instead of trying to cover tuple,(INT, true+ 
false, true + false), the tester covers the specification 


tuple, (—2147483648 + 0 + 4 + 5 + 6 + 2147483647, true + false, true + false). 


In our setting, this might mean choosing a representative set of constructor 
trees to cover, and then treating them like a finite set. In much the same way as 
with integers, rather than cover 


tuples (Tiist(bool), true + false, true + false), 


we could treat a selection of lists as atomic constructors, and cover the specifi- 
cation 


tuples( [| + [true, false] + [false, false, false] , true + false, true + false) 
which has 2-way descriptions like 
tuple;([], T, false) and tuples( [true, false] , true, T). 


Just as testers choose representative sets of integers, they could choose sets of 
trees that they think are interesting and only cover those trees. Of course, the 
set of all trees for a type is usually much larger and more complex than the set 
of integers, so this approach may not be as practical for structured types as for 
integers. Still, it is possible that small amounts of human intervention could help 
guide the choice of descriptions to cover. 


Type-Tagged Constructors Another variation to our approach would change the 
way that ADTs are translated into constructor trees. In Appendix B we show 
a simple example of a Translation for lists of Booleans, but an interesting 
problem arises if we consider lists of lists of Booleans. The most basic approach 
would be to use the same constructors (LCNil and LCCons) for both “levels” of 
list. For example, [[True]] would become (with a small abuse of notation) 


LCCons (LCCons LCTrue LCNil) LCNil. 
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Depending on the application, it might actually make more sense to use different 
constructors for the different list types ([Bool] vs. [[Bool]]). For example, 
[[True]] could instead be translated as 


LCOuterCons (LCInnerCons LCTrue LCInnerNil) LCInnerNil 


(with a slight abuse of notation), allowing for a broader range of potential test 
descriptions. This observation can be generalized to any polymorphic ADT: any 
time a single constructor is used at multiple types, it is likely beneficial to dif- 
ferentiate between them by translating to constructor tree nodes tagged with a 
monomorphized type. 


Pattern Descriptions A third potential variation is a modification to make test 
descriptions a bit less sparse. Recall that sparse test descriptions are defined as 


b= Ti (OOM ding dn). 
What if we chose this instead? 
dÊ od 
dO .-., dh) 


In the former case, every relationship is “eventual”: there is never a requirement 
that a particular constructor appear directly beneath another. In the latter case, 
the descriptions enforce a direct parent-child relationship, and we simply allow 
the expression to match anywhere in the term. We might call this class “pattern” 
test descriptions. 

We chose sparse descriptions for this work because putting © before every 
constructor leaves more opportunities for nodes matching different descriptions 
to be “interleaved” within a term, leading to smaller test suites in general. In 
some small experiments, this alternative proposal seemed to perform similarly 
across the board but worse in a few cases. Even so, experimenting with the use 
of eventually in descriptions might lead to interesting new ideas. 


8.2 (Combinatorial Coverage of More Types 


Our sparse tree description definition of combinatorial coverage is focused on 
inductive algebraic types. While these encompass a wide range of the types that 
functional programmers use, it is far from everything. One interesting extension 
would generalize descriptions to co-inductive types. We actually think that the 
current definition might almost suffice—regular tree expressions can denote infi- 
nite structures, so this generalization would likely only affect our algorithms and 
the implementation of QuickCover. We also should be able to include Generalized 
Algebraic Data Types (GADTs) without too much hassle. The biggest unknown 
is function types, which seem to require something more powerful than regular 
tree expressions to describe; indeed, it is not clear that combinatorial testing 
even makes sense for higher-order values. 
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8.3 Regular Tree Expressions for Directed Generation 


As we have shown, regular tree expressions are a powerful language for picking 
out subsets of types. In this paper, we mostly focused on automatically generat- 
ing small descriptions, but it might be possible to apply this idea more broadly 
for specifying sets of tests. One straightforward extension would be to use the 
same machinery that we use for QuickCover but, instead of covering an automat- 
ically generated set of descriptions, ensure that, at a minimum, some manually 
specified set of expressions is covered. For example, we could use a modified 
version of our algorithm to generate a test set where 


nil, cons(T, nil), and uX. cons(true, X) + nil 


are all covered. (Concretely, this would be a test suite containing, at a minimum, 
the empty list, a singleton list, and a list containing only true.) This might be 
useful for cases where the testers know a priori that certain shapes of inputs 
are important to test, but they still want to explore random inputs with those 
shapes. 

A different approach would be to create a tool that synthesizes QuickCheck 
generators that only generate terms matching a particular regular tree expres- 
sion. This idea, related to work on adapting branching processes to control test 
distributions [29], would make it easy to write highly customized generators and 
meticulously control the generated test suites. 
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Abstract. We present a framework to verify both, functional correct- 
ness and worst-case complexity of practically efficient algorithms. We 
implemented a stepwise refinement approach, using the novel concept of 
resource currencies to naturally structure the resource analysis along the 
refinement chain, and allow a fine-grained analysis of operation counts. 
Our framework targets the LLVM intermediate representation. We ex- 
tend its semantics from earlier work with a cost model. As case study, we 
verify the correctness and O(n log n) worst-case complexity of an imple- 
mentation of the introsort algorithm, whose performance is on par with 
the state-of-the-art implementation found in the GNU C++ Library. 


Keywords: Algorithm Analysis - Program Verification - Refinement 


1 Introduction 


In general, not only correctness, but also the complexity of algorithms is im- 
portant. While it is obvious that the performance observed during experiments 
is essential to solve practical problems efficiently, also the theoretical worst-case 
complexity of algorithms is crucial: a good worst-case complexity avoids timing 
regressions when hitting worst-case input, and, even more important, prevents 
denial of service attacks that intentionally produce worst-case scenarios to over- 
load critical computing infrastructure. 

For example, the C++ standard requires implementations of std::sort to have 
worst-case complexity O(nlogn) [7]. Note that this rules out quicksort [12], 
which is very fast in practice, but has quadratic worst-case complexity. Never- 
theless, some standard libraries, most prominently LLVM’s libc++ [20], still use 
sorting algorithms with quadratic worst-case complexity}? 

A practically efficient sorting algorithm with O(n log n) worst-case complex- 
ity is Musser’s introsort [22]. It combines quicksort with the O(nlogn) heap- 
sort algorithm, which is used as fallback when the quicksort recursion depth 
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exceeds a certain threshold. It allows to implement standard-compliant, prac- 
tically efficient sorting algorithms. Introsort is implemented by, e.g., the GNU 
C++ Library (libstdc++) [8]. 

In this paper, we present techniques to formally verify both, correctness and 
worst-case complexity of practically efficient implementations. We build on two 
previous lines of research by the authors. 

On one hand, we have the Isabelle Refinement Framework [19], which allows 
for a modular top-down verification approach. It utilizes stepwise refinement to 
separate the different aspects of an efficient implementation, such as algorith- 
mic idea and low-level optimizations. It provides a nondeterminism monad to 
formalize programs and refinements, and the Sepref tool to automate canonical 
data refinement steps. Its recent LLVM back end [15] allows to verify algo- 
rithms with competitive performance compared to (unverified) highly optimized 
C/C++ implementations. The Refinement Framework has been used to verify 
the functional correctness of an implementation of introsort that performs on 
par with libstdc++’s implementation [17]. 

On the other hand, we already have extended the Refinement Framework to 
reason about complexity [II]. However, this only supports the Imperative/HOL 
back end [I6]. It generates implementations in functional languages, which are 
inherently less efficient than highly optimized C/C++ implementations. This 
paper combines and extends these two approaches. Our main contributions are. 


e We present a generalized nondeterminism monad with resource cost, apply it 
to resource functions to model fine-grained currencies (Section |2) and show 
how they can be used to naturally structure refinement. 

e We extend the LLVM back end [15] with a cost model, and amend its basic 
reasoning infrastructure (Section 3). 

e We extend the Sepref tool (Section |4) to synthesize executable imperative 
code in LLVM, together with a proof of correctness and complexity. Our 
approach seamlessly supports imperative and amortized data structures. 

e We extend the verification of introsort to also show a worst-case complexity of 
O(nlogn), thus meeting the C++11 stdlib specification (Section 5}. The 
performance of our implementation is still on par with libstdc++. We believe 
that this is the first time that both, correctness and complexity of a sorting 
algorithm have been formally verified down to a competitive implementation. 


Our formalization is available at |nttps://www21.in.tum.de/~haslbema/ 


2 Specification of Algorithms With Resources 


We use the formalism of monads [24] to elegantly specify programs with resource 
usage. We first describe a framework that works for a very generic notion of 
resource, and then instantiate it with resource functions, which model resources 
of different currencies. We then describe a refinement calculus and show how 
currencies can be used to structure stepwise refinement proofs. Finally, we report 
on automation and give some examples. 
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2.1 Nondeterministic Computations With Resources 


Let us examine the features we require for our computation model. 

First, we want to specify programs by their desired properties, without having 
to fix a concrete implementation. In general, those programs have more than one 
correct result for the same input. Consider, e.g., sorting a list of pairs of numbers 
by the first element. For the input [(1, 2), (2,2), (1,3)], both [(1, 2), (1,3), (2, 2)] 
and [(1,3), (1,2), (2,2)] are valid results. Formally, this is modelled as a set of 
possible results. When we later fix an implementation, the set of possible results 
may shrink. For example, the (stable) insertion sort algorithm always returns 
the list [(1,2), (1,3), (2,2)]. We say that insertion sort refines our specification 
of sorting. 

Second, we want to define recursion by a standard fixed-point construction 
over a flat lattice. The bottom of this lattice must be a dedicated element, which 
we call fail. It represents a computation that may not terminate. 

Finally, we want to model the resources required by a computation. For 
nondeterministic programs, these may vary depending on the nondeterministic 
choices made during the computation. As we model computations by their pos- 
sible results, rather than by the exact path in the program that leads to the 
result, we also associate resource cost with possible results. When more than 
one computation path leads to the same result, we take the supremum of the 
used resources. The notion of refinement is now extended to a subset of results 
that are computed using less resources. 

We now formalize the above intuition: The type 


(a,y) NREST = fail | res (a > y option) 


models a nondeterministic computation with results of type a and resources of 
type ~E] That is, a computation is either fail, or res M, where M is a partial 
function from possible results to resources. 

We define spec ® T as a computation of any result r that satisfies ® r us- 
ing T r resources: spec ® T = res (Ar. if ® r then Some (T r) else None). 
By abuse of notation, we write spec x T for spec (Ar. r=a) (A_. T). 

Based on an ordering on the resources y, we define the refinement ordering on 
NREST, by first lifting the ordering to option with None as the bottom element, 
then pointwise to functions and finally to (a,y) NREST, setting fail as the top 
element. This matches the intuition of refinement: m < m reads as m refines m’, 
i.e., m has less possible results than m’, computed with less resources. 

We require the resources y to have a complete lattice structure, such that 
we can form suprema over the (possibly infinitely many) paths that lead to the 
same result. Moreover, when sequentially composing computations, we need to 
add up the resources. This naturally leads to a monoid structure (y, 0, +), where 
0, intuitively, stands for no resources. 

We call such types y resource types, if they have a complete lattice and monoid 
structure. Note that, in an earlier iteration of this work [II], the resource type 


4 The name NREST abbreviates Nondeterministic RESult with Time, and has been 
inherited from our earlier formalizations. 
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was fixed to extended natural numbers (enat=N U {oo}), measuring the resource 
consumption with a single number. Also note that (a,unit) NREST is isomorphic 
to our original nondeterministic result monad without resources [I9]. 

If y is a resource type, so is n — y. Intuitively, such resources consist of coins 
of different resource currencies n, the amount of coins being measured by y. 


Example 1. In the following we use the resource type ecost = string — enat, i.e., 
we have currencies described by a string, whose amount is measured by extended 
natural numbers, where co models arbitrary resource usage. Note that, while the 
resource type string— enat guides intuition, most of our theory works for general 
resource types of the form 7 — y or even just y. 

We define the function $, n to be the resource function that uses n :: enat 
coins of the currency s :: string, and write $, as shortcut for $, 1. 

A program that sorts a list in O(n?) can be specified by: 


807tspec 8 = spec (Axs. sorted rs’ A mset xs! = mset xs) ($q |zs|? + $e) 


that is, a list xs can result in any sorted list xs’ with the same elements, and 
the computation takes (at most) quadratically many q coins in the list length, 
and one c coin, independently of the list length. Intuitively, the q and c coins 
represent the constant factors of an algorithm that implements that specification 
and are later elaborated by exchanging them into several coins of more fine- 
grained currencies, corresponding to the concrete operations in the algorithm, 
e.g., comparisons and memory accesses. Abstract currencies like q and c only 
“have value” if they can be exchanged to meaningful other currencies, and finally 
pay for the resource costs of a concrete implementation. 


2.2 Atomic Operations and Control Flow 


In order to conveniently model actual computations, we define some combinators. 
The elapse m t combinator adds the (constant) resources t to all results of m: 


elapse :: (a,y) NREST > y > (a,y) NREST 
elapse fail t = fail 
elapse (res M) t = res (Az. case M x of None > None 
| Some t = Some (t + t’)) 


The progran return xz computes the single result x without using any resources: 


return ::a > (a,y) NREST 
return z=res[z+> 0] 


The combinator bind m f models the sequential composition of computations m 
and f, where f may depend on the result of m: 


5 Note that our shallow embedding makes no formal distinction between syntax and 
semantics. Nevertheless, we refer to an entity of type NREST, as program to em- 
phasize the syntactic aspect, and as computation to emphasize the semantic aspect. 
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bind :: (a,y) NREST > (a > (8,7) NREST) > (8,7) NREST 
bind fail f = fail 
bind (res M) f = Sup { elapse (f x) t |x t. Max = Some t} 


If the first computation m fails, then also the sequential composition fails. Oth- 
erwise, we consider all possible results z with resources t of m, invoke f xz, and 
add the cost t for computing x to the results of f x. The supremum aggregates 
the cases where f yields the same result, via different intermediate results of m, 
and also makes the whole expression fail if one of the f < fails. 


Example 2. We now illustrate an effect that stems from our decision to aggregate 
the resource usage of different computation paths that lead to the same result. 
Consider the program 


res (An::nat. Some ($e n)); return 0 


It first chooses an arbitrary natural number n consuming n coins of currency c, 
and then returns the result 0. That is, there are arbitrarily many paths that lead 
to the result 0, consuming arbitrarily many c coins. The supremum of this is oo, 
such that the above program is equal to elapse (return 0) ($e co). Note that 
none of the computation paths actually attains the aggregated resource usage. 
We will come back to this in Section 


Finally, we use Isabelle/HOL’s if-then-else and define a recursion combinator 
rec via a fixed-point construction [I3], to get a complete set of basic combinators. 
As these combinators also incur cost in the target LLVM, we define resource 
aware variants. Furthermore we also derive a while combinator: 


if, b then cı else cp = elapse (r+ b; if rthen cı else c2) $;f 
rec, F x = elapse (rec (AD z. F (Ax. elapse (D z) $cau) £) £) Scat 
while. b fs = rece (AD s. ife b s then s + fs; D s else return s) s 


Here, the guard of if, is a computation itself, and we consume an additional 
if coin to account for the conditional branching in the target model. Similarly, 
every recursive call consumes an additional call coin. 

Assertions fail if their condition is not met, and return unit otherwise: 


assert P = if P then return () else fail 


They are used to express preconditions of a program. A Hoare-triple for program 
m, with precondition P, postcondition Q and resource usage ¢ is written as a 
refinement condition: m < assert P; spec Q (A.. t) 


Example 3. Comparison of two list elements at a cost of t can be specified by: 
idts_cMPspec TS 1j (t) = assert (i<|zs| A j<|zs|); spec (asli < as!j) (A-. t) 


where xs/i is the ith element of list zs. Instead of fixing the cost for specifications, 
we pass them as parameter ¢. This allows us to refine different instances of ab- 
stract data types (here lists) by different concrete data structures with different 
costs. To make bigger programs more readable, we note the cost parameter in 
parenthesis at the end of the line, as, e.g., in Example [4] 
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2.3 Refinement on NREST 


We have used the refinement ordering to express Hoare triples. Two other ap- 
plications of refinement are data refinement and currency refinement. 


Data Refinement A typical use-case of refinement is to implement an abstract 
data type by a concrete data type. For example, we could implement (finite) sets 
of numbers by sorted lists. We define a refinement relation R between sorted 
lists and sets. A concrete computation m, that yields sorted lists then refines 
an abstract computation m that yields sets, if every possible concrete result is 
related to a possible abstract result. Formally, m+ < pR m, where the operator 
\-p is defined, for arguments R and m, by the following two rules. 


JDR (res M) = res (Ac. Sup {M a | a. (c,a) E€ R}) oR fail = fail 


Again, we use the supremum to aggregate the costs of all abstract results that 
are related to a concrete result. As in Example |2| this leads to the possibility 
that the supremum cost is not attained, which we discuss in Section [4.4] 


Currency Refinement Consider we want to refine Example B]into a program that 
first accesses the elements and then compares them. 


Example 4. We refine idts_cmpspec (Sides_emp) from Example [B] as follows: 


idzs_cmp zs i j = 
assert (i<|zs| A j<|xs|); 
tsi + list_getspec TS 1; ($ 
xsj + list_getspec LS J; ($tookup) 
return (xsi < zsj) (Siess) 
( 


where list_getspec zs i (T) = assert (i < |zs|); spec (zs/i) T and return z (T) 
returns the result x incurring cost T. 

Note that idzs-cmp and idzs_cmpspec use different, incompatible currency 
systems. To compare them, we need to exchange coins: one idzs_cmp coin will 
be traded for two lookup coins and one less coin. 


To make that happen we introduce the currency refinement {}c EF m. Here, 
the exchange rate E :: Na + Ne —> y specifies for each abstract currency Cg *: Na 
how many of the coins of the concrete currency ce :: Ne are needed. Note that, 
in general, one abstract coin may be exchanged into multiple coins of different 
currencies. For a resource type y that provides a multiplication operation (*) we 
define the operator }c¢ with the following two rules. 


cE (res M) = res (A r. case M r of None => None | 
Some t = Some (Ace. >> , bea * Eca Ce)) 
cE fail = fail 


Cc 


The refined computation has the same results as the original. To get the amount 
of a concrete coin Ce for some result r with resource function t, we sum, over all 
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abstract coins Ca, the amount of abstract coins needed in the original computa- 
tion (t ca) weighted by the exchange rate (E Ca Cc). 

For the sum to make sense, there must be only finitely many abstract coins cq 
with t Ca * E Ca Ce Æ 0. This can be ensured by restricting the resource functions 
t of the computation to use finitely many different coins, or by restricting the 
exchange rate E accordingly. The latter can be checked syntactically in practice. 


Example 5. For refining the specification idrs_cmpspec we can use the exchange 
rate Ey = O(idxs_cmp:= Sicokup 2 + Siess), Which does the correct exchange for 
idzs_cmp and is zero everywhere else. Here, + and 0 are lifted to functions in a 
pointwise manner, and f(-:=-) denotes a function update. We can now prove: 


idts_cmp zs i j < ok; (idrs_cmpspec £8 i j (Sidzs_cmp)) 


2.4 Refinement Patterns 


In practice, we encounter certain recurring patterns of refinement, which we 
describe in this section. 


Refinement of Specifications Instead of only asking whether a program m satisfies 
a specification res M, we also ask how much it satisfies the specification, i.e. 
what is the difference of the resources specified and actually used, denoted by 
gwp m MẸ|We have the following equality: m < res M & Some 0 < gwp m M. 

To get some intuition let us fix the resource to be time. Then, gwp m M is 
the latest feasible time at which we can start m to still match the deadline M. 
If there is no feasible starting time (gwp m M = None), m does not fulfill the 
specification M. If it has some value ¢, this is the latest feasible starting time of 
all computation paths in m. 

Using gwp, we can implement a syntax driven verification condition genera- 
tor, as already described in [I]. 


Lockstep Refinement We often refine a compound program by refining some of 
its components. Let A and C be two structurally equal programs (i.e., they have 
the same structure of combinators if,, rece, bind, etc.), and let A; and C; be the 
pairs of corresponding basic components, for i€{0,...,n}. Provided with refine- 
ment lemmas ®; z A (a;,2) € Ri => Ci; q} < oR: (YcE (A; x)) for each of 
those pairs|"| an automatic procedure walks through the program and establishes 
a refinement C < {Lon Ry (cE A). This process generates verification conditions 
for ensuring the preconditions @;, which can be discharged automatically or, if 
required, via interactive proof. 


6 The definition of gwp requires y to provide a difference operator, dual to its + 
operator. It is a straightforward generalization of the concept defined in [I], and 
thus omitted here. We only note that the resource types unit, enat, and ecost provide 
a suitable difference operator. 

T The refinement relations R; and R; relate the parameters and respectively the result 
of those components. 
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Note that, while the data refinements R; can be different for each component 
i, the exchange rate E must be the same for all components. Currently, we align 
the exchange rates by manually deriving specialized versions of the component 
refinement lemmas. However, we believe that this can be automated in many 
practical cases, by collecting constraints on the exchange rate during the lockstep 
refinement, which are solved afterwards to obtain a unified exchange rate. We 
leave the implementation of this idea to future work. 


Separating Analysis of Resource Usage and Correctness We can disregard re- 
source usage and only focus on refinement of functional correctness, and then 
add resource usage analysis later. This is useful to separate the concerns of func- 
tional correctness and resource usage proof. We will describe a practical example 
later (Section [5.5), and only present an alternative way to prove the refinement 
in Example [4] here: 

First, for functional correctness, we use the specification idrs_cmpspec (00) 
and a program idzs_cmp.. similar to idxs.cmp but with all the costs replaced 
by oo. Proving the refinement idrs_cmps. ts i j < idts_cmpspec TS i j (co) only 
requires showing verification conditions that correspond to functional prop- 
erties and termination. In particular, assertions and annotated invariants in 
the concrete program have to be proved. Proof obligations on resource us- 
age, however, collapse into the trivial t < oo. For the same reason, we get 
idzs_cmp ts i j < idts_cmps xs ij, and by transitivity obtain 


idts_cmp ts i j < idrs_cmpspec TS i j (00) 


Next, we prove idzs-cmp zs i j <n spec (A_. True) ($tookup 2 + $iess). Here, the 
refinement relation m <n m = m Æ fail = > m < m assumes that the con- 
crete program does not fail. This has the effect that, during the refinement proof, 
assertions and annotated invariants in the concrete program can be assumed to 
hold, and we can focus on the resource usage proof. 

Finally, the two refinements can be combined to obtain 


idts_cmp ts i j < id£S-CMPspec TS i j (Stookup 2 + Siess) 


3 LLVM With Cost Semantics 


The NREST-monad allows to specify programs with their resource usage in 
abstract currencies. Those currencies only have a meaning when they finally 
can be exchanged for the costs of concrete computations. In the following we 
present such a concrete computation model, namely a shallow embedding of 
the LLVM semantics into Isabelle/HOL. The embedding is an extension of our 
earlier work to also account for costs. In Section [4] we then report on linking 
the LLVM back end with the NREST front end. 


3.1 Basic Monad 


At the basis of our LLVM formalization is a monad that provides the notions of 
non-termination, failure, state, and execution costs. 
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a mres = NTERM | FAIL | SUCC a cost state 
a M = state > a mres 


Here, cost is a type for execution costs, which forms a monoid with operation + 
and neutral element 0, and state is an arbitrary type] 

The type a M describes a program that, when executed on a state, either 
does not terminate (NTERM), fails (FAIL), or returns a result of type a, its 
execution costs, and a new state (SUCC). 

It is straightforward to define the monad operations return and bind, as well 
as a recursion combinator rec over M. Thanks to the shallow embedding, we can 
also use Isabelle HOL’s if-then-else to get a complete set of basic operations. As 
an example, we show the definition of the bind operation, in the case that both 
arguments successfully compute a result: 


Assume m s = SUCC z cı sı and f z sı = SUCC r c2 s2 
then we have bind m f s = SUCC r (c,+¢2) s2 


That is, the result z and state sı after the first operation m is passed into the 
second operation f, and the result and state after the bind is what emerges from 
f. The cost for the bind is the sum of the costs for both operations. 

The basic monad operations do not cost anything. To account for execution 
costs, we define an explicit operation consume c s = SUCC () c sf] 


3.2 Shallowly Embedded LLVM Semantics 


The formalization of the LLVM semantics is organized in layers. At the bot- 
tom, there is a memory model that stores deeply embedded values, and comes 
with basic operations for allocation/deallocation, loading, storing, and pointer 
manipulation. Also the basic arithmetic operations are defined on deeply em- 
bedded integers. These operations are phrased in the basic monad, but consume 
no costs. This way, we could take them unchanged from our original LLVM for- 
malization without cost [I5]. For example, the low-level load operation has the 
signature raw_load :: raw_ptr > val M. Here, raw_ptr is the pointer type of our 
memory model, consisting of a block address and an offset, and val is our value 
type, which can be an integer, a pointer, or a pair of values. 

On top of the basic layer, we define operations that correspond to the actual 
LLVM instructions. Here, we map from deeply embedded values to shallowly 
embedded values, and add the execution costs. 

For example, the semantics of LLVM’s load instruction is defined as follows: 


8 Note that this differs from the NREST monad in Section it is deterministic, 
and provides a state. Because of determinism, we never need to form a supremum, 
and thus can base our cost model on natural numbers rather than enats. We leave 
a unification of the two monads to future work. 

° For NREST, we defined a higher-order operation elapse, while we use the first- 
order operation consume here. This is for historical reasons. Note that elapse can 
be defined in terms of consume, and vice versa. 
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llload :: a ptr > a M 

ll_load p = 
consume $joad; 
r 4+ raw_load (the_raw_ptr p); 
checked_from_val r 


It consumes the cost] for the operation, and then forwards to the raw_load 
operation of the lower layer, where the_raw_ptr and checked_from_val convert 
between the shallow and deep embedding of values. 


Like in the original formalizatior{"| an LLVM program is represented by a 
set of monomorphic constant definitions of the shape def, defined as follows: 


def = proc_name var* = block 

block = var + cmd; block | return var 

cmd = ll_<opcode> arg* | li_call proc_name arg* | llc_if arg block block 
| Ulc_while block block 


arg = var | number | null | init 


The code generator checks that the set of definitions is complete and adheres 
to the required shape. It then translates them into LLVM code, which merely 
amounts to pretty printing and translating the structured control flow by if 
and whi1d?| statements to the unstructured control flow of LLVM. A powerful 
preprocessor can convert a more general class of terms to the restricted shape 
required by the code generator. This conversion is done inside the logic, i.e., 
the processed program is proved to be equal to the original. Preprocessing steps 
include monomorphization of polymorphic constants, extraction of fixed-point 
combinators to recursive function definitions, and conversion of tuple construc- 
tors and destructors to LLVM’s insertvalue and extractvalue instructions. 


In summary, the layered architecture of our LLVM formalization allowed for 
a smooth integration of the cost aspect, reusing most of the existing formaliza- 
tion nearly unchanged. Note that we opted to integrate the cost aspect into the 
existing top layer, which converts between deep and shallow embedding. Alter- 
natively, we could have added another layer on top of the shallow embedding. 
While the latter would have been the cleaner design, we opted for the former 
approach to avoid the boilerplate of adding a new layer. This was feasible as the 
original top layer was quite thin, such that adding another aspect there did not 
result in excessive complexity. 


10 See Section for an explanation of our cost model. 

11 Actually, the only change to the original formalization is the introduction of the 
ll_call instruction, to make the costs of a function call visible. 

12 Primitive while loops are not strictly required, as they can always be replaced by tail 
recursion. Indeed, our code generator can be configured to not accept while loops, and 
our preprocessor can automatically convert while loops to tail-recursive functions. 
However, the efficiency of the generated code then relies on LLVM’s optimization 
pass to detect the tail recursion and transform it to a loop again. 
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3.3 Cost Model 


As a cost model for running time, we chose to count how often each instruction is 
executed. That is, we set cost = string —> nat, where the string encodes the name 
of an instruction. It is straightforward to define 0 and + such that (cost,0,+) 
forms a monoid. It is thus a valid cost model for our monad. 

But how realistic is our cost model, counting LLVM instructions? During 
compilation, LLVM text will be transformed by LLVM’s optimizer, and finally, 
the LLVM’s back end will translate LLVM instructions to machine instructions. 
Moreover, the actual running time of a machine program does not only depend 
on the number of executed instructions, but effects like pipeline flushes and cache 
misses also play an important role. Thus, without factoring in the details of the 
optimization passes and the target machine architecture, our cost model can, at 
best, be a rough approximation of the actual running time. 

However, we can sensibly assume that a single instruction in the original 
LLVM text will result in at most a (small) constant number of machine instruc- 
tions, and that each machine instruction has a constant worst-case execution 
time. Thus, the steps counted by our model linearly correlate to an upper bound 
of the actual execution time, though the exact correlation depends on the actual 
program, optimizer passes, and target architecture. Hence, while our cost model 
cannot be used for precise statements about execution time, it can be used to 
prove worst-case complexity. That is, a program that we have proved efficient 
will be compiled to an efficient machine program. Moreover, we can hope that 
the constant factors in the proved complexity are related to the actual constant 
factors in the machine program, i.e., an LLVM program with small constant 
factors will compile to a machine program with small constant factors. 

The above discussion justifies the following design choices: The insertvalue 
and extractvalue instructions, which are used to construct and destruct tuple 
values, have no associated costs. The main reason for this design is to enable 
transparent use of tupled values, e.g., to encode the state of a while loop. We 
expect LLVM to translate the members of the tuple to separate registers anyway, 
such that no real costs are associated with tupling/untupling. 

We define the malloc instruction to take cost proportional to the number 
of allocated elements. Note that LLVM itself does not provide memory man- 
agement, and our code generator forwards memory management instructions to 
the libc implementation of the target platform. We use the calloc function here, 
which is supposed to initialize the allocated memory with zeros. While the exact 
costs of that are implementation dependent, they certainly will depend on the 
size of the allocated block. 

Charguéraud and Pottier [6] §2.7] discuss the adequacy of abstract cost mod- 
els in a functional setting. In their classification, our abstraction is on Level 2. 


3.4 Reasoning Setup 


Once we have defined the semantics, we need to set up some basic reasoning 
infrastructure. The original Isabelle-LLVM already comes with a quite generic 
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separation logic and verification condition generation framework. Here, we report 
on our extensions to resources using time credits. 


Separation Logic with Time Credits Our reasoning infrastructure is based on 
separation logic with time credits [IJ6J10]. We follow the algebraic approach of 
Calcagno et al. [B], using an earlier extension of Klein et al. [I8]. 

A separation algebra on type a induces a separation logic on assertions that 
are predicates over a. To guide intuition, elements of a are called heaps here. We 
use the following separation logic operators: The assertion +@ holds for an empty 
heap if ® holds, O=f True describes the empty heap, and 34 is the existential 
quantifier lifted to assertions. The separating conjunction P x Q describes a heap 
comprised from two disjoint parts, one described by P and the other described 
by Q, and entailment P F Q states that Q holds for every heap described by P. 

Separation algebras naturally extend over product and function types, i.e., for 
separation algebras a, 3, and any type y, also a x 6 and y —> a are separation 
algebras, where the operations are lifted pointwise. 

Note that enat forms a separation algebra, where elements, i.e. time credits, 
are always disjoint. Hence, also ecost = string —> enat, and amemory x ecost are 
separation algebras, where amemory is the separation algebra that we already 
used in [15] to describe the abstract memory of LLVM. Thus, amemory x ecost 
induces a separation logic with time credits that match our cost model. The 
time credit assertion $c = (Aa. a=(0,c)) describes an empty memory (0) and 
precisely the time c. The primitive assertions on amemory are lifted analogously 
to describe no time credits. 


Weakest Precondition and Hoare Triples We start by defining a concrete state 
cstate that describes the memory content and the available resources: 


cstate = memory X ecost 


where memory is the memory type from our original LLVM formalization. Based 
on this, we define the weakest precondition predicate: 


wp :: a M > (a > cstate + bool) > cstate > bool 
wp m Q (s,cc) = (Ares. ms = SUCC res A c<ccA Qr (8, cc—o)). 


Intuitively, the costs cc stored in the state is the credit available to the program. 
The weakest precondition holds if the program runs with real costs c that are 
within the available credit, and Q holds for the result r, the new memory s’, and 
the new credit, cc—c, which is the old credit reduced by the actually required 
costs. Note that actual costs have type cost = string — nat, i.e., are always 
finite, while the credits have type ecost = string > enat, i.e., there can be infinite 
credits. Setting the credit to be infinite for all instruction types yields the classical 
weakest precondition that requires termination, but enforces no time limit. 

Our concrete state type, in particular the memory, does not form a separation 
algebra, as the natural memory model of LLVM has no natural notion of partial 
memories. Thus, we define an abstraction function that maps a concrete state 
to an abstract state astate, which forms a separation algebra: 
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astate = amemory x ecost abs (m, c) = (abs, m, c) 


Again, amemory and absm is the abstract state and abstraction function from 
the original LLVM formalization. The costs already form a separation algebra, 
so we do not abstract them further. 

With this, we can instantiate a generic VCG infrastructure: let cstate be 
concrete states, wp :: a M— (a > cstate + bool) + cstate > bool be a weak- 
est precondition predicate, and astate an abstract state, linked to concrete states 
via an abstraction function abs :: cstate > astate. Further, assume that wp dis- 
tributes over conjunctions, i.e., 


wp c Qi s^ wp c Qos = wpc(Ars. Q rs A Qo 7s) 8 


Finally, let T be an affine top [5], i.e., an assertion with O H T and T x> T = T, 
which captures resources that can be safely discarded. We define the Hoare triple 
{P} c {Q} to hold iff: 


VE s. (PxF) (abs s) => wp c(Aàr s. (Qrx T x F) (abs s)) s 


Intuitively, {P} c {Q} holds if, for all states that contain a part described by 
assertion P, command c terminates with result r and a state where that part 
is replaced by a part described by Q rx T, and the rest of the state has not 
changed. Here, Q r is the postcondition of the Hoare triple, and T describes 
resources that may be left over and can be discarded. 

In our case, we set T to describe the empty memory and any amount of time 
credits. This matches the intuition that a program must free all its memory, but 
may run faster than estimated, i.e., leave over some time credits. Note that our 
wp distributes over conjunctions. 

The generic VCG infrastructure now provides us with a syntax driven VCG 
with a simple frame inference heuristics. 


3.5 Primitive Setup 


Once we have defined the basic reasoning infrastructure, we have to prove Hoare 
triples for the basic LLVM instructions and control flow combinators. As we have 
added the cost aspect only at the top level of our semantics, we can reuse most of 
the material from our original LLVM formalization without time. Technically, we 
instantiate our reasoning infrastructure with a weakest precondition predicate 
wpn, which only holds for programs that consume no costs. We define: 


wpn m Q s = wp m (FST o Q) (s,0) where FST P = X(s,c). P s A c=0 


The resulting reasoning infrastructure is identical with the one of our original 
formalization, most of which could be reused. Only for the topmost level, i.e., for 
those functions that correspond to the functional semantics of the actual LLVM 
instructions, we lift the Hoare triples over wpn to Hoare triples over wp: 


{P} c {Q}upn = {FST P} c {FST 0 Q} 
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Example 6. Recall the low-level raw_load and the high-level [/_load instruction 
from Section [3.2] The raw_load instruction consumes no costs, and our original 
LLVM formalization provides the following Hoare triple: 


{raw_pto p x} raw_load p {Ar. +(r=2) x raw_pto p t}wpn 
This can be transferred to a Hoare triple over wp: 
{FST (raw_pto p x)} raw_load p {Ar. *(r=2) x FST (raw_pto p x)} 
which is then used to prove the Hoare triple for the program [l_load 
{$ $ioaa x pto p x} ll_load p {Ar. *(r=2z) x pto p z} 
where pto p x = FST (raw_pto (the_raw_ptr p) (to_val x)). 


Using the VCG and the Hoare triples for the LLVM instructions, we can now de- 
fine and prove correct data structures and algorithms. While this works smoothly 
for simple data structures like arrays, it does not scale to more complex develop- 
ments. In contrast, NREST does scale, but lacks support for the low-level pointer 
reasoning required for basic data structures. In the next section, we show how to 
combine both approaches, with the LLVM level providing basic data structures 
and the NREST level using them as building blocks for larger algorithms. 


4 Automatic Refinement 


In this section we describe a tool to synthesize a concrete program in the LLVM- 
monad from an abstract algorithm in the NREST-monad. It can automatically 
refine abstract functional data structures to imperative heap-based ones. We 
will describe the synthesis predicate hnr that connects the two monads, the 
synthesis tool, and a way to extract Hoare triples from hnr predicates. Finally, 
we will discuss an effect that prevents combining hnr with data refinements in 
the NREST-monad in the general case. 


4.1 Heap nondeterminism refinement 


The heap nondeterminism refinement predicate hnr I m; I” R m intuitively ex- 
presses that the concrete program m, computes a concrete result that relates, via 
the refinement assertion R, to a result in the abstract program m, using at most 
the resources specified by m for that result. A refinement assertion describes 
how an abstract variable is refined by a concrete value on the heap. It can also 
contain time credits. The assertions I’ and I” constitute the heaps before and 
after the computation and typically are a separating conjunction of refinement 
assertions for the respective parameters of m, and m. Formally, we define: 


har Do m, I'R m = m £ fail 
(VE s c. (T x F) (absm sc) => 
(Ara Ca. elapse (return ra) Ca < m 
A wp my (Ar (sc). (T'x Rr rax Fx T) (absm ¥,c)) (s, C+ca))) 
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The predicate holds if either the abstract program fails or if, for all heaps and 
resources (s,c) that satisfy the pre-assertion T with some frame F, there exists an 
abstract result and cost (fa,Ca) that refine m, and m terminates with concrete 
result rin a state s' where I’ with the frame holds, and r relates to the abstract 
result via assertion R. The execution costs of m, and the time credits c’ required 
by the post-assertion I” are paid for by the specified cost cq and the time credits 
c described by the pre-assertion I’. Thus, the real costs are paid by a combination 
of the advertised costs in the abstract program and the potential difference of 
I’ and I, allowing to seamlessly model amortized computation costs. 

Using the affine top T, it is possible for the program to throw away portions 
of the heap. Note that our T can only discard time credits. Memory must be 
explicitly freed by the concrete program my. 

Also note that hnr is not tied to the LLVM semantics specifically. It actually 
is a general pattern for combining the NREST-monad with any other program 
semantics that provides a weakest precondition and a separation algebra for data 
and resources. 


4.2 The Sepref Tool 


The Sepref tool automatically synthesizes a concrete program in the 
LLVM-monad from an abstract algorithm in the NREST-monad. It symbolically 
executes the abstract program while maintaining refinements for the abstract 
variables to a concrete representation and generates a concrete program as well 
as a valid hnr predicate. Proof obligationd"] that occur during this process are 
discharged automatically, guided by user-provided hints where necessary. 

The synthesis requires rules for all abstract combinators. For example, bind 
is processed by the following rule: 


1 | hnr T m I” Re m; 


2 (Vz a. hnr (Ry g ox I") (fh g) (RL g ox I") Ry (f 2)); 
3 MK_FREE R', free] => 
4 hnr T (ay — my; ry + fi t; free q; return n) I” Ry (x — m; f 2) 


To refine z 4+ m; fx, we first execute m, synthesizing the concrete program 
m; (line 1). The state after m is Ry 2 zx I’, where xv is the result created 
by m. From this state, we execute f x and synthesize fy 2; (line 2). The new 
state is Ri, x cx I” x Ry y y, where y is the result of f x. Now, the inter- 
mediate variable x goes out of scope and has to be deallocated. The predicate 
MK_FREE R’, free (line 3) states that free is a deallocator for data structures 
implemented by refinement assertion R/,. Note that free can only use time credits 
that are stored in R}. Typically, these are payed for during creation of the data 
structure. This way amortization can be used effectively to hide the necessary 
free operation and its costs in the abstract program. 

All other combinators (rece, ife, while,, etc.) have similar rules that are 
used to decompose an abstract program into parts, synthesize corresponding con- 


13 E.g. from implementing mathematical integers with fixed-bit machine words. 
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crete parts recursively and combine them afterwards with the respective combi- 
nators from LLVM. At the leaves of this decomposition, atomic operations need 
to be provided with suitable synthesis predicates. 

An example is a list lookup that is implemented by an array: 


hnr (arraya p ts x snata % i) 
(array-nth p i+) 
(arraya p xs x snata i; i) ida (list-getspec £s i (A_. array-getcost)) 


where arraya, snata and ida relate a list with an array, an unbounded natural 
number with a bounded signed word and identical elements respectively. With 
an array at address p holding the list zs and an index % that is a bounded 
signed word representing an unbounded natural number i, array_nth leaves the 
parameters unchanged and extracts the element specified by list-getspec incurring 
costs array-getcost =$ofs_ptr + Stoad: 

Ideally, each operation has its own currency (e.g. list_get). However, as our 
definition of hnr does not support currency refinement, the basic operations must 
use the currencies of the LLVM cost model. To still obtain modular hnr rules, 
we encapsulate specifications for data structures with their cost, e.g. by defining 
array_getspec=list_getspec (A-. array_geteost). These can easily be introduced in 
an additional refinement step. Automating this process, and possibly integrating 
currency refinement into hnr is left to future work. 


4.3 Extracting Hoare Triples 


Note that hnr predicates cannot always be expressed as Hoare triples, as the 
running time bound of the abstract program may depend on the result, which 
we cannot refer to in the precondition of a Hoare triple, where we have to express 
the allowed running time as time credits. However, if the running time bound 
does not depend on the result, we can write hnr as a Hoare triple: 


hnr T m, I’ R (spec @ (A_.T)) = {$TxI bm fAr. T'4JAra. Rrra * T(P ra)} 


While intermediate components might not be of this form, final algorithms typ- 
ically are. At the end of a development, this rule allows to extract a Hoare triple 
in the underlying LLVM semantics, cutting out the NREST-monad. For validat- 
ing the correctness claim of an algorithm, only the final Hoare triple needs to be 
inspected, which only uses concepts of the underlying semantics. 

Note that the above rule is an equivalence. Thus, it can also be used to obtain 
synthesis rules from Hoare triples provided by the basic VCG infrastructure. 


4.4 Attain Supremum 


We comment on a problem that arises when composing hnr predicates and data 
refinement in the NREST monad. Consider the following programs and relations: 


m = res [x > $a, yr $o] RR = {(z,a),(z,b)} 
m = res [z+ $a + $] Ra = ida 
m+ = consume ($, + $,); return z 
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Data refinement defines the resource bound for a concrete result (here z) as 
the supremum over all bounds of related results (here z, y). Thus, we have 
m < JoRr m. Moreover, we trivially have hnr O m; O Ra m. Intuitively, we 
want to compose these two refinements, to obtain hnr O m; O (Ra © Rpr) m. 
However, as our definition of hnr does not form a supremum, this would require 
$a + $o < $a or $a + $e < $p, which obviously does not hold. 

We have not yet found a way to define hnr or {bp in a form that does not 
exhibit this effect. Instead, we explicitly require that the supremum of the data 
refinement has a witness. The predicate attains-sup m m Rpr characterizes that 
situation: it holds, if for all results r of m the supremum of the set of all ab- 
stractions (r,)€Rpr applied to m is in that set. This trivially holds if Rpr is 
single-valued, i.e. any concrete value is related with at most one abstract value, 
or if mis one-time, i.e. assigns the same resource bound to all its results. 

In practice we do encounter non-single-valued relationd™4| but they only oc- 
cur as intermediate results where the composition with an hnr predicate is not 
necessary. Also, collapsing synthesis predicates and refinements in the NREST- 
monad typically is performed for the final algorithm whose running time does 
not depend on the result, thus is one-time, and ultimately attains_sup. 


5 Case Study: Introsort 


In this section, we apply our framework to the introsort algorithm [22]. We build 
upon the verification of its functional correctness [I7] to verify its running time 
analysis and synthesize competitive efficient LLVM code for it. Following the 
“top-down” mantra, we use several intermediate steps to refine a specification 
down to an implementation. 


5.1 Specification of Sorting 
We start with the specification of sorting a slice of a list: 


slice_sortspec t8 lh (T) = 
assert (I<h A h<length x9); 
spec (Azs. slice_sort_aux x89 lh as) (A_. T) 


where slice_sort_aux rs) lh xs states that xs is a permutation of rsp, xs is sorted 
between l and h and equal to zs) anywhere else. 


5.2 Introsort’s Idea 


The introsort algorithm is based on quicksort. Like quicksort, it finds a pivot 
element, partitions the list around the pivot, and recursively sorts the two par- 
titions. Unlike quicksort, however, it keeps track of the recursion depth, and if it 


14 The relation oarr, described in earlier work 4.2] by one of the authors, is used 
to model ownership of parts of a list on an abstract level and is an example for a 
relation that is not single-valued. 
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exceeds a certain value (typically |2logn]), it falls back to heapsort to sort the 
current partition. Intuitively, quicksort’s worst-case behaviour can only occur 
when unbalanced partitioning causes a high recursion depth, and the introsort 
algorithm limits the recursion depth, falling back to the O(n logn) heapsort al- 
gorithm. This combines the good practical performance of quicksort with the 
good worst-case complexity of heapsort. 

Our implementation of introsort follows the implementation of libstdc++, 
which includes a second optimization: a first phase executes quicksort (with 
fallback to heapsort), but stops the recursion when the partition size falls below 
a certain threshold 7. Then, a second phase sorts the whole list with one final 
pass of insertion sort. This exploits the fact that insertion sort is actually faster 
than quicksort for almost-sorted lists, i.e., lists where any element is less than 
T positions away from its final position in the sorted list. While the optimal 
threshold 7 needs to be determined empirically, it does not influence the worst- 
case complexity of the final insertion sort, which is O(7n) = O(n) for constant 
T. The threshold 7 will be an implicit parameter from now on. 

While this seems like a quite concrete optimization, the two phases are al- 
ready visible in the abstract algorithm, which is defined as follows in NREST: 


introsort zs lh = 
assert(I < h); 


n + return h—I; ($sub) 

ife n > 1 then (Su) 
xs 4— almost-soTtspec TS lh;  ($almost-sort) 
xs 4+ final_sortspec xs lh ($ final_sort ) 


return zs 
else return zs 


where almost_sortspec (T) specifies an algorithm that almost-sorts a list, con- 
suming at most T resources and final_sortspec (T) specifies an algorithm that 
sorts an almost-sorted list, consuming at most T resources. 

The program introsort leaves trivial lists unchanged and otherwise executes 
the first and second phase. Its resource usage is bounded by the sum of the 
first and second phase and some overhead for the subtraction, comparison, and 
if-then-else. Using the verification condition generator we prove that introsort is 
correct, i.e., refines the specification of sorting a slice: 


introsort xs Lh < co Eis (slice_sortspec zs Lh (Ssort)) 


where Ejs = O(sort:=introsorteost) is the exchange rate used at this step and 
introsorteost = $sub + Sif + Sit + Satmost_sort + Sfinat_sort is the total allotted 
cost for introsort. 


5.3 Quicksort Scheme 


The first phase can be implemented in the following way: 


1 introsort_aux u xs lh = 
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2 d + depthspec lh; ($depth) 

3 rece (Aintrosort_rec (zs,l,h,d). 

4 assert (l < h); 

5 n + h-l; ($sub) 

6 ife n > T then (Siz) 

7 if. d = 0 then (Seq) 

8 slice_sortspec x8 lh (Ssort. (u (b-1))) 
9 else 

10 (zs,m) <— partitionspec x8 lh; ($partitione (b-1)) 
11 d4 d- 1; (Ssub) 

12 xs < introsort_rec (xs,l,m,d); 

13 xs < introsort_rec (as,m,h,d); 

14 return xs 

15 else return zts 

16 ) (as,1,h,d) 


where partitionspec partitions a slice into two non-empty partitions, returning 
the start index m of the second partition, and depthspec specifies |2log(h — 1) |. 

Let us first analyze the recursive part: if the slice is shorter than the threshold 
T, it is simply returned (line 15). Unless the recursion depth limit is reached, 
the slice is partitioned using h — l partition, coins, and the procedure is called 
recursively for both partitions (lines 10-14). Otherwise, the slice is sorted at a 
price of u (h—J) sorte coins (line 8). The function u here represents the leading 
term in the asymptotic costs of the used sorting algorithm, and the sorte coin 
can be seen as the constant factor. This currency will later be exchanged into 
the respective currencies that are used by the sorting algorithm. Note that we 
use currency sort, to describe costs per comparison of a sorting algorithm, while 
currency sort describes the cost for a whole sorting algorithm. 

Showing that the procedure results in an almost-sorted list is straightforward. 
The running time analysis, however, is a bit more involved. We presume a func- 
tion that maps the length of a slice to an upper bound on the abstract steps 
required for sorting the slice. We will later use heapsort with piniogn n = nlogn. 

Consider the recursion tree of a call in introsort_rec: We pessimistically as- 
sume that for every leaf in the recursion tree we need to call the fallback sorting 
algorithm. Furthermore, we have to partition at every inner node. This has cost 
linear in the length of the current slice. For each following inner level the lengths 
of the slices add up to the current one’s, and so do the incurred costs. Finally 
we have some overhead at every level including the final one. The cost of the 
recursive part of introsort_auz is: 


introsort_reCeost U (n,d) = $sorte (u n) + Spartition, d * n 


+ ((d+1)*n)*(Si¢ 2 + Seca 2 + Seq + $u + $sub 2) 


The correctness of the running time bound is proved by induction over the 
recursion of introsort_rec. If the recursion limit is reached (d=0), the first sum- 
mand pays for the fallback sorting algorithm. If d>0, part of the second sum- 
mand pays for the partitioning of the current slice, then the list is split into 
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two and the recursive costs are payed for by parts of all three summands. To 
bound the costs for the fallback sorting algorithm, js needs to be superadditive: 
a+ pu b< p (a+b). In both cases, the third summand pays for the overhead 
in the current call. 

For d=|2logn| and an O(nlogn) fallback sorting algorithm (W=ntogn), 
introsort_reCeost Hnlogn is in O(n log n) [5] imn fact, any dé O(log n) would do. 

Before executing the recursive method, introsort_aux calculates the depth 
limit d. The correctness theorem then reads: 


introsort_auT Petegn US lh < Nea h—1) )\(tlmiost-20P tence z8 lh $almost-sort) 


with Eisa n = O(almost_sort:= Sdepth + introsort_reccost Hnlogn (N, |2logn])). 
Note that specifications typically use a single coin of a specific currency for 
their abstract operation, which is then exchanged for the actual costs, usually 
depending on the parameters. 
This concludes the interesting part of the running time analysis of the first 
phase. It is now left to plug in an O(nlogn) fallback sorting algorithm, and a 
linear partitioning algorithm. 


Heapsort Independently of introsort, we have proved correctness and worst-case 
complexity of heapsort, yielding the following refinement lemma: 


heapsort ts Lh < o(Ens (h—l)) (slice_sortspec zs Lh ($sort)) 


where Eps n = O(sort:= cı + log n * co +n* c3 + (n x log n) * c4) for some 
constants c; :: ecost. 

Assuming that n > af] we can estimate Eps n sort < Hniogn n * c, for c= 
Cı + c2 + c3 + c4, and thus get, for Ens = O( sorte := c): 


o(Ens (A-D) (slice_sortspec 28 lh ($sort)) 
< oFns' (slice_sortspec zs lh ($sorte (Hniogn (h—1)))) 


and, by, transitivity 
heapsort ts lh < oEns (slice_sortspec x8 lh ($sorte (Uniogn (h—1)))) 


Note that our framework allowed us to easily convert the abstract currency from 
a single operation-specific sort coin to a sort, coin for each comparison operation. 


Partition and Depth Computation We implement partitioning with the Hoare 
partitioning scheme using the median-of-3 as the pivot element. Moreover, we 
implement the computation of the depth limit (2|log(h — 1)|) by a loop that 
counts how often we can divide by two until zero is reached. This yields the 
following refinement lemmas: 


pivot_partition ts Lh < oEpp (partitionspec x8 lh ($partition. (h—0))) 
calc_depth Lh < ic(Eca (h—-l)) (depthspec lh (Saeptn)) 


15 More precisely, the sum over all (finitely many) currencies is in O(n log n). 
16 Note that this is a valid assumption, as heapsort will never be called for trivial slices. 
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Combining the Refinements We replace slice_sortspec, partitionsnec and depthspec 
by their implementations heapsort, pivot_partition and calc_depth. We call the 
resulting implementation introsort_auz2, and prove 


introsort_aux ts lh < \o(Eaux (h—1)) (introsort_aut Hniogn zs Lh) 


where the exchange rate Faur combines the exchange rates Ens’, Epp and Eca 
for the component refinements. 

Transitive combination with the correctness lemma for introsort_aux then 
yields the correctness lemma for introsort_auxa: 


introsort_aux, ts Lh < c(Eisaz (h—1)) (almost_sortspec zs Lh (S$atmost_sort)) 


where Ejsa2 n = O(almost_sort:=)c(Eaux n) (introsort_auteost n)) and the op- 
eration |cE t applies an exchange rate to a resource function. 


Refining Resources The stepwise refinement approach allows to structure an 
algorithm verification in a way that correctness arguments can be conducted 
on a high level and implementation details can be added later. Resource cur- 
rencies permit the same for the resource analysis of algorithms: they summa- 
rize compound costs, allow reasoning on a higher level of abstraction and can 
later be refined into fine-grained costs. For example, in the resource analysis 
of introsort_aux the currencies sorte and partition, abstract the cost of the re- 
spective subroutines. The abstract resource argument is independent from their 
implementation details, which are only added in a subsequent refinement step, 
via the exchange rate Fauz- 


5.4 Final Insertion Sort 


The second phase is implemented by insertion sort, repeatedly calling the sub- 
routine insert. The specification of insert for an index i captures the intuition 
that it goes from a slice that is sorted up to index i—1 to one that is sorted up 
to index 7. Insertion is implemented by moving the last element to the left, as 
long as the element left of it is greater (or the start of the list has been reached). 
Moving an element to its correct position takes at most 7 steps, as after the 
first phase the list is almost sorted, i.e., any element is less than 7 positions 
away from its final position in the sorted list. Moreover, elements originally at 
positions greater 7 will never reach the beginning of the list, which allows for the 
unguarded optimization. It omits the bounds check for those elements, saving 
one index comparison in the innermost loop. Formalizing these arguments yields 
the implementation final_insertion_sort that satisfies 


final_insertion_sort xs Lh < \o(Efis(h—J) (final_sortspec zs Lh (S$ pinat_sort)) 


where Epis n = O(final_sort:=final_insertioncost n), and final_insertioncost n is 
linear in n. 
Note that final_insertion_sort and introsort_auz2 use the same currency sys- 
tem. Plugging both refinements into introsort yields introsort, and the lemma 
introsort, zs Lh < o(Eisa(h—))) (introsort xs lh) 


where the exchange rate E;.2 combines the rates Fisa2 and Epis- 
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5.5 Separating Correctness and Complexity Proofs 


A crucial function in heapsort is sift-down, which restores the heap property 
by moving the top element down in the heap. To implement this function, we 
first prove correct a version sift.down,, which uses swap operations to move the 
element. In a next step, we refine this to sift_downz, which saves the top element, 
then executes upward moves instead of swaps, and, after the last step, moves 
the saved top element to its final position. This optimization spares half of the 
memory accesses, exploiting the fact that the next swap operation will overwrite 
an element just written by the previous swap operation. 

However, this refinement is not structural: it replaces swap operations by 
move operations, and adds an additional move operation at the end. At this 
point, we chose to separate the functional correctness and resource aspect, to 
avoid the complexity of a combined non-structural functional and currency 
refinement. It turns out that proving the complexity of the optimized ver- 
sion sift_downz directly is straightforward. Thus, as sketched in Section [2.4] we 
first provq!] sift_downg < siftdown, < sift_downspec (00), ignoring the resource 
aspect. Separately, we prove sift_downz <n spec (A_. True) sift_downeost, and 
combine the two statements to get sift_downg < sift.downspec sift-downecost- 


5.6 Refining to LLVM 


The above abstract programs implicitly come with a fixed type and comparison 
operator for the elements of the list to be sorted. Those programs use abstract 
operations and currencies for arithmetic operations on indexes, control flow, 
comparisons and read/write of a random-access iterator (abstracted by lists with 
update and lookup operations). 

When we further assume an LLVM program that refines the comparison 
operator in LLVM, and specify how the random-access data structure should be 
implemented — we choose arrays — we can automatically synthesize an LLVM 
program introsort_impl that refines introsorts, i.e., satisfies the theorem: 


hnr (arraya p ts x snata l lx snata h; h) 
(introsort_impl p l} hy) 
(snata l, lx snata h; h) array, (introsort, zs | h) 


Combination with the refinement lemmas for introsortz and introsort, followed 
by conversion to a Hoare triple, yields our final correctness statement: 


L< h^ h< length tso => 

{$(introsort_impleost (h—1)) x arraya p zso x snata ly lx snata hy h} 
introsort-impl p h hy 

{Ar. das. arraya r xs x T(slice_sort_aux x89 lh xs) x snata lh 1x snata hy h} 


where introsort_impleost :: nat + ecostis the cost bound obtained from applying 
the exchange rates Ej, and then Ejs2 to $sort- 


17 Note that we have omitted the function parameters for better readability. 
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Note that this statement is independent of the Refinement Framework. Thus, 
to believe in its meaningfulness, one has to only check the formalization of Hoare 
triples, separation logic, and the LLVM semantics. 

To formally prove the statement “introsort_impl has complexity O(nlog n)” , 
we observe that introsort_impl.ost uses only finitely many currencies, and only 
finitely many coins of each currency. We define the overall number of coins as 


introsort_implaticost n = Xc. introsort_impleost n c 
which expands to 
introsort_implaticost n = 4693 + 5x log n + 231 * n + 455 * (n * log n) 


which, in turn, is routinely proved to be in O(n logn). 

As a last step, we instantiate the element type to 64-bit unsigned integers and 
the comparison operation to LLVM’s icmp_ult instruction, to obtain a program 
that sorts integers in ascending order. Our code generator can export this to 
actual LLVM text and a corresponding header file for interfacing our sorting 
algorithm from C or C++. 

As LLVM does not support generics, we cannot implement a replacement for 
C++#’s generic std::sort< T>. However, by repeating the last step for different 
types and compare operators, we can implement a replacement for any fixed T. 


5.7 Benchmarks 


In this section we present benchmarks comparing the code extracted from our 
formalization with the real world implementation of introsort from the GNU 
C++ Library (libstdc++). Also, as a regression test, we compare with the code 
extracted from an earlier formalization of introsort [I7] that did not verify the 
running time complexity and used an earlier iteration of the Sepref framework 
and LLVM semantics without time. 

The results are shown in Figure |1| As expected, all three implementations 
have similar running times. Note that the small differences are well within the 
noise of the measurements. We conclude that adding the complexity proof to our 
introsort formalization, and the time aspect to our refinement process has not 
introduced any timing regressions in the generated code. Note, however, that the 
code generated by our current formalization is not identical to what the original 
formalization generated. This is mainly due to small changes in the formalization 
introduced when adding the timing aspect. 


6 Conclusions 


We have presented a refinement framework for the simultaneous verification of 
functional correctness and complexity of algorithm implementations with com- 
petitive practical performance. 

We use stepwise refinement to separate high-level algorithmic ideas from 
low-level optimizations, enabling convenient verification of highly optimized al- 
gorithms. The novel concept of resource currencies also allows structuring of the 
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Fig. 1. Comparison of the running time measured for the code generated by the formal- 
ization described in this paper (Isabelle-LLVM), the original formalization from [I7] 
(notime), and the libstdc++ implementation. Arrays with 10° uint64s with various 
distributions were sorted, and we display the smallest time of 10 runs. The programs 
were compiled with clang-10 -O3, and run on an Intel XEON E5-2699 with 128GiB 
RAM and 256K/55M L2/L3 cache. See for details of the benchmarking method. 


complexity proofs along the refinement chain. Our framework refines down to 
the LLVM intermediate representation, such that we can use a state-of-the-art 
compiler to generate performant programs. 

As a case-study, we have proved the functional correctness and complexity 
of the introsort sorting algorithm. Our verified implementation performs on par 
with the (unverified) state-of-the-art implementation from the GNU C++ Li- 
brary. It also provably meets the C++11 standard library [7] specification for 
std::sort, which in particular requires a worst-case time complexity of O(n log n). 
We are not aware of any other verified real-world implementations of sorting al- 
gorithms that come with a complexity analysis. 

Our work is a combination and substantial extension of an earlier refinement 
framework for functional correctness which also comes with a verification 
of introsort [I7], and a refinement framework for a single enat-valued currency 
[L]. In particular, we have generalized the refinement framework to arbitrary 
resources, introduced currencies that help organizing refinement proofs, extended 
the LLVM semantics and reasoning infrastructure with a cost model, connected 
it to the refinement framework via a new version of the Sepref tool, and, finally, 
added the complexity analysis for introsort. 


6.1 Related Work 


Nipkow et al. [23] §4.1] collect verification efforts concerning sorting algorithms. 
We add a few instances verifying running time: Wang et al. use TiML to 
verify correctness and asymptotic time complexity of mergesort automatically. 
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Zhan and Haslbeck [26] verify functional correctness and asymptotic running 
time analysis of imperative versions of insertion sort and mergesort. We build 
on earlier work by Lammich [I7] and provide the first verification of functional 
correctness and asymptotic running time analysis of heapsort and introsort. 

The idea to generalize the nres monad to resource types originates 
from Carbonneaux et al. [4]. They use potential functions (state > enat) in- 
stead of predicates (state — bool), present a quantitative Hoare logic and extend 
the CompCert compiler to preserve properties of stack-usage from programs in 
Clight to compiled programs. 

We see our paper in the line of research concerning simultaneously verifying 
functional correctness and worst-case time complexity of algorithms. Atkey [I] 
pioneered resource analysis with separation logic, Guéneau et al. [9] present a 
framework that uses time credits in Coq and apply it to involved algorithms and 
data structures [10/6]. We further develop their work in three ways: First, while 
time credits usually are natural numbers or integers [10], we gener- 
alize to an abstract resource type and specifically use resource currencies for a 
fine-grained analysis. Second, we use stepwise refinement to structure the veri- 
fication and make the resource analysis of larger use-cases manageable. Third, 
we provide facilities to automatically extract efficient competitive code from the 
verification. The following are the most complex algorithms and data structures 
with verified running time analysis using time credits and separation logic we are 
aware of: a linear time selection algorithm [26], an incremental cycle detection 
algorithm [I0], Union-Find [6], Edmonds-Karp and Kruskal’s algorithm [I]. 


6.2 Future Work 


A verified compiler down to machine code would further reduce the trusted code 
base of our approach. While that is not expected to be available soon for LLVM in 
Isabelle, the NREST-monad and the Sepref tool are general enough to connect 
to a different back end. Formalizing one of the CompCert C semantics in 
Isabelle, connecting it to the NREST-monad and then processing synthesized C 
code with CompCert’s verified compiler would be a way to go. 

In this paper we apply our framework to verify an involved algorithm that 
only uses basic data structures, i.e. arrays. A next step is to verify more involved 
data structures, e.g. by porting existing verifications of the Imperative Collec- 
tions Framework [I6] to LLVM. We do not yet see how to reason about the 
running time of data structures like hash maps, where worst-case analysis would 
be possible but not useful. In general, extending the framework to average-case 
analysis and probabilistic programs are exciting roads to take. 

We plan to implement more automation, saving the user from writing boil- 
erplate code when handling resource currencies and exchange rates. 

Neither the LLVM nor the NREST level of our framework is tied to running 
time. Applying it to other resources like maximum heap space consumption 
might be a next step. 
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Abstract. Determining upper bounds on the time complexity of a program is 
a fundamental problem with a variety of applications, such as performance de- 
bugging, resource certification, and compile-time optimizations. Automated tech- 
niques for cost analysis excel at bounding the resource complexity of programs 
that use integer values and linear arithmetic. Unfortunately, they fall short when 
execution traces become more involved, esp. when data dependencies may affect 
the termination conditions of loops. In such cases, state-of-the-art analyzers have 
shown to produce loose bounds, or even no bound at all. 

We propose a novel technique that generalizes the common notion of recurrence 
relations based on ranking functions. Existing methods usually unfold one loop 
iteration, and examine the resulting relations between variables. These relations 
assist in establishing a recurrence that bounds the number of loop iterations. We 
propose a different approach, where we derive recurrences by comparing whole 
traces with whole traces of a lower rank, avoiding the need to analyze the com- 
plexity of intermediate states. We offer a set of global properties, defined with re- 
spect to whole traces, that facilitate such a comparison, and show that these prop- 
erties can be checked efficiently using a handful of local conditions. To this end, 
we adapt state squeezers, an induction mechanism previously used for verifying 
safety properties. We demonstrate that this technique encompasses the reasoning 
power of bounded unfolding, and more. We present some seemingly innocuous, 
yet intricate, examples where previous tools based on cost relations and control 
flow analysis fail to solve, and that our squeezer-powered approach succeeds. 


1 Introduction 


Cost analysis is the problem of estimating the resource usage of a given program, over 
all of its possible executions. It complements functional verification—of safety and 
liveness properties—and is an important task in formal software certification. When 
used in combination with functional verification, cost analysis ensures that a program 
is not only correct, but completes its processing in a reasonable amount of time, uses a 
reasonable amount of memory, communication bandwidth, etc. In this work we focus 
on run-time complexity analysis. While the area has been studied extensively, e.g., [19], 
B81, (3), LA, (6), Hel, (21), (12), [9], the general problem of constraining the number 
of iterations in programs containing loops with arbitrary termination conditions remains 
hard. 

A prominent approach to computing upper bounds on the time complexity of a pro- 
gram identifies a well-founded numerical measure over program states that decreases in 
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void binary_counter(unsigned int n) { 
unsigned int c[n]; 
memset(c,@,n*sizeof(unsigned int)); 


int i=0; 
while (i < n) { 
if (c[i] == 1) /*scan 1-prefix*/{c[i] = 0; i++; } 
else /*increment*/ {c[i] = 1; i=0; print(c);} 
}} 


Fig. 1. A program that produces all combinations of n bits. 


every step of the program, also called a ranking function. In this case, an upper bound on 
the measure of the initial states comprises an upper bound on the program’s time com- 
plexity. Finding such measures manually is often extremely difficult. The cost relations 
approach, dating back to [28], attempts to automate this process by using the control 
flow graph of the program to extract recurrence formulas that characterize this measure. 
Roughly speaking, the recurrences relate the measures (costs) of adjacent nodes in the 
graph, taking into account the cost of the step between them. In this way, the cost rela- 
tions track the evolution of the measure between every pair of consecutive states along 
the executions of the program. 


One limitation of cost relations is the need to capture the number of steps remaining 
for execution in every state, that is, all intermediate states along all executions. If the 
structure of the state is complex, this may require higher order expressions, e.g., sum- 
ming over an unbounded number of elements. As an example, consider the program in 
Fig. [that implements a binary counter represented by an array of bits. 


In this case, a ranking function that decreases between every two consecutive iter- 
ations of the loop, or even between two iterations that print the value of the counter, 
depends on the entire content of the array. Attempting to express a ranking function 
over the scalar variables of this program is analogous to abstracting the loop as a finite- 
state system that ignores the content of the array, and as such contains transition cycles 
(e.g. the abstract state (n ++ no,i ++ 0), obtained by projecting the state to the scalar 
variables only, repeats multiple times in any trace)—meaning that no strictly decreas- 
ing function can be defined in this way. Similarly, any attempt to consider a bounded 
number of bits will encounter the same difficulty. 


In this paper, we propose a novel approach for extracting recurrence relations cap- 
turing the time complexity of an imperative program, modeled as a transition system, 
by relating whole traces instead of individual states. The key idea is to relate a trace to 
(one or more) shorter traces. This allows to formulate a recurrence that resolves to the 
length of the trace and recurs over the values at the initial states only. We sidestep the 
need to take into account the more complex parts of the state that change along the trace 
(e.g., in the case of the binary counter, the array is initialized with zeros). 


Our approach relies on the notion of state squeezers [22], previously used exclu- 
sively for the verification of safety properties. We present a novel aspect where the 
same squeezers can be used to determine complexity bounds, by replacing the safety 
property check with trace length judgements. 
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Squeezers provide a means to perform induction on the “size” of (initial) states to 
prove that all reachable states adhere to a given specification. This is accomplished 
by attaching ranks from a well-founded set to states, and defining a squeezer function 
that maps states to states of a lower rank. Note that the notion of a rank used in our 
work is distinct from that of a ranking function, and the two should not be confused; in 
particular, a rank is not required to decrease on execution steps. Previously, squeezers 
were utilized for safety verification: the ability to establish safety is achieved by having 
the squeezer map states in a way that forms a (relaxed form of) a simulation relation, 
ensuring that the traces of the lower-rank states simulate the traces of the higher rank 
states. Due to the simulation property, which is verified locally, safety over states with 
a base rank, carries over (by induction over the rank) to states of any higher rank. 


In this work, we use the construction of well-founded ranks and squeezers to de- 
fine a recurrence formula representing (an upper bound on) the time complexity of the 
procedure being analyzed. We do so by expressing the complexity (length) of traces in 
terms of the complexity of lower-rank traces. This new setting raises additional chal- 
lenges: it is no longer sufficient to relate traces to lower-rank traces; we also need to 
quantify the discrepancy between the lengths of the traces, as well as between their 
ranks. This is achieved by a certain form of simulation that is parameterized by stutter- 
ing shapes (for the lengths) and by means of a rank bounding function (for the ranks). 
Furthermore, while limits each trace to relate to a single lower-rank trace, we have 
found that it is sometimes beneficial to employ a decomposition of the original trace into 
several consecutive trace segments, so that each segment corresponds to some (possi- 
bly different) lower-rank trace.The segmentation simplifies the analysis of the length 
of the entire trace, since it creates sub-analyses that are easier to carry out, and the 
sum of which gives the desired recurrence formula. This also enables a richer set of 
recurrences to be constructed automatically, namely non-single recurrences (meaning 
that the recursive reference may appear more than once on the right hand side of the 
equation). 

The base case of the recurrence is obtained by computing an upper bound on the 
time complexity of base-rank states. This is typically a simpler problem that may be 
addressed, e.g., by symbolic execution due to the bounded nature of the base. The solu- 
tion to the recurrence formula with the respective base case soundly overapproximates 
the time complexity of the procedure. 


We show that, conceptually, the classical approach for generating recurrences based 
on ranking functions can be viewed as a special case of our approach where the squeezer 
maps a state to its immediate successor. The real power of our approach is in the free- 
dom to define other squeezers, producing simpler recursions, and avoiding the need for 
complex ranking functions. 


Our use of squeezers for extracting recurrences that bound the complexity of imper- 
ative programs is related to the way analyses for functional programs (e.g. [20]) use the 
term(s) in recursive function calls to extract recurrences. The functional programming 
style coincidentally provides such candidate terms. The novelty of our approach is in 
introducing the concept of a squeezer explicitly, leading to a more flexible analysis as it 
does not restrict the squeezer to follow specific terms in the program. In particular, this 
allows reasoning over space in imperative programs as well. 
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The main results of this paper can be summarized as follows: 

— We propose a novel technique for run-time complexity analysis of imperative pro- 
grams based on state squeezers. Squeezers, together with rank-bounding functions, 
are used for extracting recurrence relations whose solutions overapproximate the 
length of executions of the input program. 

— We formalize the notions of state squeezers, partitioned simulation and rank bound- 
ing functions that underlie the approach, and establish conditions that ensure sound- 
ness of the recurrence relations. 

— We demonstrate that squeezers and rank bounding functions can be efficiently syn- 
thesized and verified, due to their compactness, especially relative to explicit rank- 
ing functions. 

— We implemented our approach and applied it successfully to several small but intri- 
cate programs, some of which could not have been handled by existing techniques. 


2 Overview 


In this section we give a high level description of our technique for complexity analysis 
using the binary counter example in Fig. 


Example: Binary counter The procedure in Fig. [I] receives as an input a number n of 
bits and iterates over all their possible values in the range 0...2” — 1. The “current” 
value is maintained in an array c which is initialized to zero and whose length is n. c[0] 
represents the least significant bit. The loop scans the array from the least significant bit 
forward looking for the leftmost 0 and zeroing the prefix of 1s. As soon as it encounters 
a 0, it sets it to 1 and starts the scan from the beginning. The program terminates when 
it reaches the end of the array (i = n), all array entries are zeros, and the last value was 
111...; at this point all the values have been enumerated. 


Existing analyses All recent methods that we are aware of (such as [16]4[20]) fail to 
analyze the complexity of this procedure (in fact, most methods will fail to realize that 
the loop terminates at all). One reason for that is the need to model the contents of 
the array whose size in unknown at compile time. However, even if data were modeled 
somehow and taken into account, finding a ranking function, which underlies existing 
approaches, is hard since this function is required to decrease between any two consec- 
utive iterations along any execution. Here for instance, to the best of our knowledge, 
such a function would depend on an unbounded number of elements of the array; it 
would need to extract the current value as an integer, along the lines of De celj] 27. 

The use of a ranking function for complexity analysis is somewhat analogous to 
the use of inductive invariants in safety verification. Both are based on induction over 
time along an execution. This paper is inspired by previous work showing that 
verification can also be done when the induction is performed on the size (rank) of the 
state rather than on the number of iterations, where the size of the state may corre- 
spond, e.g., to the size of an unbounded data structure. We argue that similar concepts 
can be applied in a framework for complexity classification. That is, we try to infer 
a recurrence relation that is based on the rank of the state and correlates the lengths 
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of complete executions—executions that start from an initial state—of different ranks. 
This sidesteps the need to express the length of partial executions, which start from 
intermediate states. While the approach applies to bounded-state systems as well, its 
benefits become most apparent when the program contains a-priori unbounded stores, 
such as arrays. 


Our approach. Roughly speaking, our approach for computing recurrence formulas 
that provide an upper bound on the complexity of a procedure is based on the following 
ingredients: 


— A rank function r : init — X that maps initial states to ranks from a well founded 
set (X, <) with base B. Intuitively, the rank of the initial state governs the time 
complexity of the entire trace, and we also consider it to be the rank of the trace. As 
we shall soon see, this rank can be significantly simpler than a ranking function. 

— A squeezer Y : X — X that maintains (some variant of) a simulation relation, thus 
ensuring a bona fide correspondence between higher-rank traces and lower-rank 
traces through correspondence between states. 

— A trace partition pq : X — [1..d] that maps each state to a segment-identifier 
i € {1..d], and induces a decomposition of a trace into segments, allowing Y to 
map each of them to a separate, lower-rank mini-trace. 

— A rank-bounding function Y : X x {1..d] — X that provides an upper bound on the 
rank of the initial states of the d mini-traces based on the rank of the higher-rank 
trace. (The rank is not required to be uniform across mini-traces). 


All of these ingredients are synthesized automatically, as we discuss in Section [4] Next, 
we elaborate on each of these ingredients, and illustrate them using the binary counter 
example. We further demonstrate how we use these ingredients to find recurrence for- 
mulas describing (an upper bound on) the complexity of the program. 


Some notations We adopt a standard encoding of a program as a transition system over 
a state space X, with a set of initial states init C X and transition function tr : XY > X, 
where a transition corresponds to a loop iteration. We use reach C X to denote the set 
of reachable states, reach = {0 | doo, k. tr*(o9) =o A oo € init}. 


Defining the rank of a state Ranks are taken from a well founded set (X, <) with a basis 
B C X that contains all the minimal elements of X. The rank function, r : init > X, 
aims to abstract away irrelevant data from the (initial) state that does not effect the 
execution time, and only uses state “features” that do. When proper ranks are used, the 
rank of an initial state is all that is needed to provide a tight bound on its trace length. 
Since ranks are taken from a well founded set, they can be recursed over. In the binary 
counter example, the chosen rank is n, namely, the rank function maps each state to the 
size of the array. (Notice that the rank does not depend on the contents of the array; 
in contrast, bounding the trace length from any intermediate state, and not just initial 
states, would have required considering the content of the array). 

Given the rank function, our analysis extracts a recurrence formula for the complex- 
ity function comp,, : X — N U {oo} that provides an upper bound on the number of 
iterations of tr based on the rank of the initial states. In our exposition, we sometimes 


Run-time Complexity Bounds Using Squeezers 325 


r(oo) =4 
i $ i i i i i i i 
y s ¥ y ¥ Y ¥ ¥ 
oļo:Jo]o 1fo]ofo ofofo]o ofifolo 1fifofo o}1fo]o ofifolo ofofo]o 
Mi t tr t 
00 » OL > O2 ' > 03 » 04 i > 05 : > OG > O7 
: i ae Pal Le 
= ¢ ps - 
: \ 7 a ae 
: \ z 7 ae 
vi Y \ 7 CO ad 
: ’ ae 
mi v mad 
4 a >a, = > o, > oO 
4 0 tr 1 tr 2 tr 3 
r(o,)=3 o}ojo ifo]o ofofo ofofo 
4 4 y 
i i i i 
Y ((n,i,c)) = (n = 1,4 + 1, c[1:]) Y¥(n)=n-1 £ — y = max 


Fig. 2. Correspondence between two traces of the binary counter program. Squeezer removes the 
leftmost array entry, that represents the least significant bit. The rank is the array size, i.e., four 
on the upper trace and three on the lower one. The simulation includes only 1-,2- and 3-steps, 
so the length of the upper trace is at most three times that of the lower trace, yielding an overall 
complexity bound of O(3”). 


also refer to a time complexity function over states, comp, : init —> N U {oo}, which 
is defined directly on the (initial) states, as the number of iterations in an execution that 
starts with some go € init. 


Defining a squeezer The squeezer Y : X — X is a function that maps states to states 
of lower-rank traces (where the rank of a trace is determined by the rank of its initial 
state), down to the base ranks B. Its importance is in defining a correspondence be- 
tween higher-rank traces and lower-rank ones that can be verified locally, by examining 
individual states rather than full traces. The kind of correspondence that the squeezer 
is required to ensure affects the flexibility of the approach and the kind of recurrence 
formulas that it may yield. To start off, consider a rather naive squeezer that satisfies the 
following local properties: 


— rank decrease of non-base initial states: oo € init A r(oo) ¢ B = r(Y(o0)) < 
r(oo), and 
— simulation 
e initial anchor: co € init > Y (co) € init, 
e k-step: o € reach = Jk. tr(¥(c)) = Y(tr*(c)). 


As an example, the squeezer we consider for the binary counter program is rather 
intuitive: it removes the least significant bit (c[0]), and adjusts the index i accordingly. 
Doing so yields a state with rank r(Y(o0)) = r(oo) — 1. Fig. [2] shows the corre- 
spondence between a 4-bit binary counter, and a 3-bit one. The figure illustrates the 
simulation k-step property for k = 1, 2,3: co and 03 are (3, 1)-stuttering, cı and oy are 
(2, 1)-stuttering, and o2, a5 and gg are (1, 1)-stuttering. 

The simulation property induces a correlation between a higher rank trace 7 and a 
lower rank one 7’, such that every step of r’ is matched by k steps in 7. Whenever a 
state ø satisfies the k-step property, we will refer to it as being (k, 1)-stuttering. (We 
usually only care about the smallest k that satisfies the property for a given a.) Now 
suppose that there exists some k € N+ such that for every trace T(9) and every state 
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o € T(00), o is (k,1)-stuttering with 1 < k < k. This would yield the following 
complexity bound: 
comps(ao) < k- comps(Y(a0))- (1) 


All your base [F] What should happen if we repeatedly apply Y to some initial state 
do, each time obtaining a new, lower-rank trace? Since r(Y(a0)) < r(ao), and since 
(X, <) is well-founded, we will eventually hit some state of base rank: 


Y(Y¥(..-(00))...) =o suchthat r(oġ) €B 


Hence, if we know the complexity of the initial states with a base rank, we can apply 
Eq. (1) iteratively to compute an upper bound of the complexity of any initial state. 

How many steps will be needed to get from an arbitrary initial state a9 to a9? 
Clearly, this depends on the rank, and the way in which Y decreases it. 

Consider the binary counter program again, with the rank r(o) = n. (N, <) is 
well-founded, with a single minimum 0. If we define, e.g., B = {0,1}, we know that 
the length of any trace with n € B is bounded by a constant, 2. (Bounding the length 
of traces starting from an initial state co where r(a9) € B can be done with known 
methods, e.g., symbolic execution). Since the rank decreases by 1 on each “squeeze”, 
we get the following exponential bound: 


comp,(d9) < 2- 3”! = O(3") (2) 


The last logical step, going from (I) to (2), is, in fact, highly involved: since Eq. 
is a mapping of states, solving such a recurrence for arbitrary Y cannot be carried out 
using known automated methods. Instead, we implicitly used the rank of the state, n, 
to extract a recurrence over scalar values and obtain a closed-form expression. Let us 
make this reasoning explicit by first expressing Eq. in terms of comp, instead of 
comp .: 

comp, (n) < E - compz (n — 1) 


Here, n — 1 denotes the rank obtained when squeezing an initial state of rank n. Unlike 
Eq. (ip. this is a recurrence formula over (N, <) that may be solved algorithmically, 
leading to the solution comp, (n) = O(3”). 


Surplus analysis Assuming the worst k for all the states in the trace can be too conser- 
vative; in particular, if there are only a few states that satisfy the k-step property, and all 
the others satisfy the 1-step property. In the latter case, if we know that at most b states 
in any one trace have k > 1, we can formulate the tighter bound: 


comps(oo) < comps(¥(a0)) + +b (3) 


Incidentally, in the current setting of the binary counter program, the number of k- 
steps (3-steps) is not bounded. So we cannot apply the inequality (3) repeatedly on any 
trace, as the number of 3-steps depends on the initial state. However, we can improve 
the analysis by partitioning the trace to two parts, as we explain next. 


3ihttps://knowyourmeme.com/memes/all - your-base-are-belong- to-us 
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Segments and mini-traces Note that both (1) and (3) “suffer” from an inherent restric- 
tion that the right hand side contains exactly one recursive reference. As such, they are 
limited in expressing certain kinds of complexity classes. 

In order to get more diverse recurrences, including non-single recurrences, we pro- 
pose an extension of the simulation property that allows more than one lower-rank trace: 


— partitioned simulation 
e initial anchor: co € init = Y (oo) € init (same as before), 
e k-step: o € reach => Ik. tr(Y(o)) = Y (tr (o)) (same as before) or 
Y (tr(c)) € init (switch) 


This definition allows a new mini-trace to start at any point along a higher-rank trace 
T, thus marking the beginning of a new segment of 7. When this occurs, we call tr(c) 
a switch state. For the sake of uniformity, we also refer to all initial states og € init as 
switch states. Hence, each segment of 7 starts with a switch state, and the mini-traces 
are the lower-level traces that correspond to the segments (these are the traces that start 
from Y (øs), where gs is a switch state). The length of T can now be expressed as the 
sum of lower-level mini-traces. 

However, there are two problems remaining. First, we need to extend the “rank 
decrease of non-base initial states” requirement to any switch state in order to ensure 
that the ranks of all mini-traces are indeed lower. Namely, we need to require that if 
as is any switch state in a trace from oo, then r(Y(o5)) < r(ao). Second, even if we 
extend the rank decrease requirement, this definition does not suggest a way to bound 
the number of correlated mini-traces and their respective ranks, and therefore suggests 
no effective way to produce an equation for comp, as before. 

To sidestep the problem of a potentially unbounded number of mini-traces, we aug- 
ment the definition of simulation with a trace partition function; to address the chal- 
lenge of the rank decrease we use a rank-bounding function, which is responsible both 
for ensuring that the rank of the mini-traces decreases and for bounding their ranks. 


Defining a partition We define a function pa : X — {1,...,d}, parameterized by 
a constant d, called a partition function, that is weakly monotone along any trace 
(palo) < pa(tr(c))). This function induces a partition of any trace 7 into (at most) 
d segments by grouping states based on the value of pa(a). To ensure the segments and 
mini-traces are aligned, we require that switch states only occur at segment boundaries. 


— d-partitioned simulation: 
e initial anchor: co € init = Y (co) € init (same as before), 
e k-step: o € reach = Jk. tr(¥(c)) = Y(tr*(c)) (same as before) or 
Y (tr(c)) € init A palo) < paltr(o)) (segment switch) 


In our running example, let us change Y so that it shrinks the state by removing 
the most significant bit instead of the least. This leads to a partition of the execution 
trace for r(oo) = n into two segments, as shown in Fig. |3| The partition function is 
pa = (i > n || e[n — 1])? 2 : 1 (essentially, c[n — 1] + 1, except that the final state is 
slightly different). As can be seen from the figure, each segment simulates a mini-trace 
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Fig. 3. An execution trace of the binary counter program that corresponds to two mini-traces of 
lower rank. 


of rank n — 1, with k = 1 for all the steps except for the last step (at a28) where k = 2. 
In this case, it would be folly to use the recurrence (1) with k = 2, since all the steps 
are 1:1 except one. Instead, we can formulate a tighter bound: 


comp, (do) < comp, (o0) + comp,(o4) + 2 


Where: comp,(a4), comp,(a¢) are the lengths of the mini-traces, and 2 is the sur- 
plus from the switch transition 014 —> 015 plus the 2-step at ag. In the case of this pro- 
gram, we know that r(o}) = r(ag) = r(o0)—1, for any initial state co, therefore, turn- 
ing to comp,,, we can derive and solve the recurrence comp, (n) = 2-comp,,(n—1)+2, 
which together with the base yields the bound: 


comp, (n) = 2"t! — 2 


Clearly, a general condition is required in order to identify the ranks of the corre- 
sponding initial states of the (lower-rank) mini-traces (and at the same time, ensure that 
they decrease). 


Bounding the ranks of squeezed switch states This is not a trivial task, since as pre- 
viously noted, the squeezed ranks could be different, and may depend on properties 
present in the corresponding switch states. To achieve this goal, once a partition func- 
tion pq is defined, we also define a rank-bounding function Ý : X x {1,...,d} > X, 
where for any do € init and switch state os, Y provides a bound for the rank of Y (as) 
based on that of go: 


r(¥(as)) < ¥(r(o0), pa(os)) < r(a0) (4) 


The rightmost inequality ensures that a mini-trace that starts from Y (0s) is of lower- 
rank than oo, and as such extends the “rank decrease” requirement to all mini-traces. 
Based on this restriction, we can formulate a recurrence for comp,, based on the initial 
rank p = r(ao), as follows: 


comp.(p) < X` comps (F(p, i) + (d—1) + k-b (5) 


i=1 
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Where b, as before, is the number of k-steps for which k > 1, and k is the bound 
on k (k < k). The expression (d — 1) represents the transitions between segments, and 
k - b represents the surplus of the p-rank trace over the total lengths of the mini-traces. 


It should be clear from the definition above, that Y is quite intricate. How would 
we compute it effectively? The rank decrease of the initial states and the simulation 
properties were local by nature, and thus amenable to validation with an SMT solver. 
The Ý function is inherently global, defined w.r.t. an entire trace. This makes the prop- 
erty (4) challenging for verification methods based on SMT. To render this check more 
feasible with first-order reasoning, we introduce two special cases where the problem 
of checking (4) becomes easier: rank preservation and a single segment, explained next. 


Taming Y with rank preservation To obtain rank preservation, we extend the rank func- 
tion to all states (instead of just the initial states), and require that the rank is preserved 
along transitions. This is appropriate in some of the scenarios we encountered. For ex- 
ample, the binary counter illustration satisfies the property that along any execution 
{o;}S2o, the rank is preserved: r(o;) = r(o;41). Rank preservation means that given a 
switch state os of an arbitrary segment i, we know that r(os) = r(a9). Once this is set, 
Ý only needs to overapproximate the rank of Y (øs) in terms of the rank of the same 
state o. 


Taming Y with a single segment In this case, checking (4) reduces to a single check of 
the initial state, which is the only switch state. It turns out that the restriction to a single 
segment is still expressive enough to handle many loop types. 


Putting it all together Theoretically, r, Y, pg, and Y can be manually written by the 
user. However, this is a rather tedious task, that is straightforward enough to be auto- 
mated. We observed that all the aforementioned functions are simple enough entities, 
that can be expressed through a strict syntax using first order logic. Similar to [22], we 
apply a generate-and-test synthesis procedure to enumerate a space of possible expres- 
sions representing them. This process is explained in Section f] 


3 Complexity Analysis based on Squeezers 


In this section we develop the formal foundations of our approach for extracting recur- 

rence relations describing the time complexity of an imperative program based on state 

squeezers. We present the ingredients that underly the approach, the conditions they 

are required to satisfy, and the recurrence relations they induce. In the next section, we 

explain how to extract the recurrences automatically. Given the recurrence relation, a 

dedicated (external) tool may be applied to end up with a closed formula, similar to [3]. 
We use transition systems to capture the semantics of a program. 


Definition 1 (Transition Systems). A transition system is a tuple (37, init, tr), where 
X is a set of states, init C X is a set of initial states and tr : X — X is a transi- 
tion function (rather than a transition relation, since only deterministic procedures are 
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considered). The set of terminal states F C X is implicitly defined by tr(a) = o. An ex- 
ecution trace (or a trace in short) is a finite or infinite sequence of states T = 00,01,..- 
such that 044, = tr(o;) for every 0 < i < |r|. A state o € X defines an execu- 
tion trace T(0) = {tr*(o)}ien. Whenever there exists an index 0 < k < |r| s.t. 
on E F, we truncate T(c) into a finite trace {tr'(o)}*_9, where k is the minimal 
such index. The trace is initial if it starts from an initial state, i.e., o € init. Unless 
explicitly stated otherwise, all traces we consider are initial. The set of reachable states 
is reach = {0 € X | doo € init . o € T(o0)}. 


Roughly, to represent a program by a transition system, we translate it into a single 
loop program, where init consists of the states encountered when entering the loop, and 
transitions correspond to iterations of the loop. 

In the sequel, we fix a transition system (X, init, tr) with a set F of terminal states 
and a set reach of reachable states. 


Definition 2 (Complexity over states). For a state o € X, we denote by comp,(c) 
the number of transitions from o to a terminal state along T(o) (the trace that starts 
from o). Formally, if T(a) does not include a terminal state, i.e., the procedure does not 
terminate from o, then comp,(a) = œ. Otherwise: 


comp,(a) = min{k € N | tr” (o) € F}. 


The complexity function of the program maps each initial state oo € init to its time 
complexity comp, (oo) E NU {oo}. 


Our complexity analysis derives a recurrence relation for the complexity function by 
expressing the length of a trace in terms of the lengths of traces that start from lower 
rank states. This is achieved by (i) attaching to each initial state a rank from a well- 
founded set that we use as the argument of the complexity function and that we recur 
over, and (ii) defining a squeezer that maps each state from the original trace to a state in 
a lower-rank trace; the mapping forms a partitioned simulation according to a partition 
function that decomposes a trace to segments; each segment is simulated by a (separate) 
lower-rank trace, allowing to express the length of the former in terms of the latter, and 
finally, (iii) defining a rank bounding function that expresses (an upper bound on) the 
ranks of the lower-rank traces in terms of the rank of the higher-rank trace. We elaborate 
on these components next. 


3.1 Time complexity as a function of rank 


We start by defining a rank function that allows us to express the time complexity of an 
initial state by means of its rank. 


Definition 3 (Rank). Let X be a set, and < be a well-founded partial order over X. 
Let B D min(X) be a base for X, where min( X) is the set of all the minimal elements 
of X w.r.t. <. A rank function r : init — X maps each initial state to a rank in X. We 
extend the notion of a rank to initial traces as follows. Given an initial trace T = T (00), 
we define its rank to be the rank of oo. We refer to states co such that r(oo) € B as the 
base states. Similarly, (initial) traces whose ranks are in B are called base traces. 
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In our analysis, ranks range over X = N” (for some m € N*), with < defined by the 
lexicographic order. Ranks let us abstract away data inside the initial execution states 
which does not affect the worst-case bound on the trace length. For example, the length 
of traces of the binary counter program (Fig. |1) is completely agnostic to the actual 
content of the array at the initial state. The only parameter that affects its trace length 
is the array size, and not which integers are stored inside it. Hence, a suitable rank 
function in this example maps an initial state to its array length. This is despite the fact 
that the execution does depend on the content of the array, and, in particular, the number 
of remaining iterations from an intermediate state within the execution depends on it. 
The partial order < and the base set B will be used to define the recurrence formula as 
we explain in the sequel. 

We will assume from now on that (X, <, B), as well as the rank function, are fixed, 
and can be understood from context. The rank function r induces a complexity function 
comp, : X —> NU {co} over ranks, defined as follows. 


Definition 4 (Complexity over ranks). The complexity function over ranks, comp, : 
X > NU {oo}, is defined by: 


comp,,(p) = max{ comp, (00) | r(a0) < pA ao € init} 


The definition ensures that for every initial state a9 € init, we can compute (an 
upper bound on) its time complexity based on its rank, as follows: comp, (oo) < 
comp,,(r(oo)). The complexity of p takes into account all states with r(a) < p and 
not only those with rank exactly p, to ensure monotonicity of comp, in the rank (i.e., 
if p1 < p2 then comp,,(p1) < comp,,(p2)). Our approach is targeted at extracting a 
recurrence relation for comp,,. 


3.2 Complexity decomposition by partitioned simulation 


In order to express the length of a trace in terms of the lengths of traces of lower ranks, 
we use a squeezer that maps states from the original trace to states of lower-rank traces 
and (implicitly) induces a correspondence between the original trace and the lower-rank 
trace(s). For now, we do not require the squeezer to decrease the rank of the trace; this 
requirement will be added later. The squeezer is accompanied by a partition function 
to form a partitioned simulation that allows a single higher-rank trace to be matched to 
multiple lower-rank traces such that their lengths may be correlated. 


Definition 5 (Squeezer, Y). A squeezer is a function Y : X > X. 


Definition 6. A function pa : X —> {1,...,d}, where d € NF is called a d-partition 
function if for every trace T = 09,01,... it holds that pa(oi+1) > palci) for every 
0<i< |ri. 


The partition function partitions a trace into a bounded number of segments, where 
each segment consists of states with the same value of pg. We refer to the first state of 
a segment as a switch state, and to the last state of a finite segment as a last state (note 
that if 7 is infinite, its last segment has no last state). In particular, this means that the 
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initial state of a trace is a switch state. (Note that a state may be a switch state in one 
trace but not in another, while a last state is a last state in any trace, as long as the same 
partition function is considered.) 

Our complexity analysis requires the squeezer to form a partitioned simulation with 
respect to pa. Roughly, this means that the squeezer maps each segment of a trace to 
a (lower-rank) trace that “simulates” it. To this end, we require all the states o within 
a segment of a trace to be (h, @)-“stuttering”, for some h > £ > 1. Stuttering lets 
h consecutive transitions of ø be matched to £ consecutive transitions of its squeezed 
counterpart. If h = Z, the state o contributes to the complexity the same number of 
steps as the squeezed state. Otherwise, o contributes h — £ additional steps, resulting in 
a longer trace. Recall that terminal states also have outgoing transitions (to themselves), 
however these transitions do not capture actual steps; they do not contribute to the 
complexity. Hence, stuttering also requires that “real” transitions of ø are matched to 
“real” transitions of its squeezed counterpart, namely, if the latter encounter a terminal 
state, so must the former. For the last states of segments the requirement is slightly 
different as the simulation ends at the last state, and a new simulation begins in the next 
segment. In order to account for the transition from the last state of one segment to the 
first (switch) state of the next segment, last states are considered (2, 1)-stuttering if they 
are squeezed into terminal states, unless they are terminal themselveq] In any other 
case, they are considered (1, 1)-stuttering. The formal definitions follow. 


Definition 7 (Stuttering States). A non-last state o € X is called a (h, ¢)-stuttering 
state, for h > € > 1, if: (i) trf(Y (a)) = Y(tr"(c)); (ii) for every i < £, tr'(Y(c)) g 
F; (iii) tr“(Y(c)) € F implies that Y(tr"(c)) € F. A last state o € X is (1,1)- 
stuttering ifo € F or Y (o) ¢ F. Otherwise, it is (2, 1)-stuttering. 


To obtain a partitioned simulation, switch states (along any trace), which start new 
segments, are further required to be squeezed into initial states (since our complexity 
analysis only applies to initial states). We denote by S,,(7) the switch states of trace 
T according to partition pg and by Sp, the switch states of all traces according to the 
partition py. Namely, Sp, = init U {tr (0) | o € reach A pala) < pa(tr(a)) }. 


Definition 8 (Partitioned Simulation). We say that a squeezer Y : X + X forms a 
{ (hi, £:)}"_1-partitioned simulation according to pa, denoted Y ~ PS, ({ (hi, £i) }i-1) 
if for every reachable state o we have that: 


— o is (hj, €;)-stuttering for some 1 < i < n, and 
- o E€ Sp, > Y(a) € init. 


Note that Definition [7] implies that a non-terminal state may only be squeezed into a 
terminal state if it is the last state in its segments. When { (hj, ¢;)};_, is irrelevant or 
clear from the context, we omit it from the notation and simply write Y ~ PS,,. 


4 Considering a non-terminal last state that is squeezed into a terminal state as (1, 0)-stuttering 
may have been more intuitive than (2, 1)-stuttering, but both properly capture the discrepancy 
between the number of transitions in the higher and lower rank traces, and (2, 1) better fits the 
rest of the technical development, which assumes that h;, 4; > 1. 
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A trace squeezed by Y ~ PSp,({ (hi, 4:)}"_,) may have an unbounded number of 
(hi, €;)-stuttering states, which hinders the ability to define a recurrence relation based 
on the simulation. To overcome this, our complexity decomposition may use k >1 
to capture a common multiplicative factor of all the stuttering pairs, with the target of 


leaving only a bounded number of states whose stuttering exceeds k and needs to be 
added separately. This will become important in Theorem|I] 


Observation 1 (Complexity decomposition) Let Y ~ PSpq ({(hi, €i) }#1), and k> 


1. Let Ez C {1,. .. , n} be the set of indices such that } hi > &, Then for every dg € init 
we have. that 


comp, (o0) < 5 k- comp,(Y(o)) + 5 5 hy — lik 


oESp, (T(G0)) i€Ez o€Ki(T(G0)) 


where K;(7(a0)) is the multiset of (hi, li)-stuttering states in T(00). 


In the observation, the first addend summarizes the complexity contributed by all the 
lower-rank traces, while using k as an upper bound on the “inflation” of the traces. 
However, the states that are (h;, ¢;)-stuttering with hi that exceeds k contribute addi- 


tional h; — (4 -k) steps to the complexity, and as a result, need to be taken into account 
separately. This is handled by the second addend, which adds the steps that were not 
accounted for by the first addend. While we use the same inflation factor F across the 
entire trace, a simple extension of the decomposition property may consider a different 
factor k in each segment. Note that the first addend always sums over a finite number 
of elements since the number of switch states is at most d — the number of segments. If 
T (a0) is finite, the second addend also sums over a finite number of elements. 
Observation [1] considers the complexity function over states, and is oblivious to 
the rank. In particular, it does not rely on the squeezer decreasing the rank of states. 
Next, we use this observation as the basis for extracting a recurrence relation for the 
complexity function over ranks, in which case, decreasing the rank becomes important. 


3.3 Extraction of recurrence relations over ranks 


Based on the complexity decomposition, we define recurrence relations that capture 
comp,, — the time complexity of the initial states as a function of their ranks. To go 
from the complexity as a function of the actual states (as in Observation [I} to the com- 
plexity as a function of their ranks, we need to express the rank of Y (øs) for a switch 
state g, as a function of the rank of ao. To this end, we define Ý: 


Definition 9. Given r, Y and pa such that Y ~ PSp,, a function Y : X x {1,...,d} > 
X is a rank bounding function if for every p E€ X — B and 1 < i < d, if T(o0) is an 
initial trace such that r(o9) = p, and ag E€ Sp,(T(G0)) is a switch state such that 
Palas) = i, the following holds: 


(i) upper bound: r(Y(os)) < ¥(p,i) and (ii) rank decrease: ¥(p,i) < p 
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In other words, Definition P]requires that for every non-base initial state oo € init and 
switch state o, at segment i of T(oo), we have that r(Y (0s)) < Y (r(oo),i) < r(oo). 
Recall that r(Y(o5)) is well defined since Y(o,) is required to be an initial state. The 
definition states that Y(p,i) provides an upper bound on the rank of squeezed switch 
states in a non-base trace of rank p. comp,,(r(V(c))) < comp,,(¥ (p, i)) is ensured by 
the monotonicity of comp,,. This definition also requires the rank of non-base traces to 
strictly decrease when they are squeezed, as captured by the “rank decrease” inequality. 

Obtaining a rank bounding function, or even verifying that a given Y satisfies this 
requirement, is a challenging task. We return to this question later in this section. 

These conditions allow to substitute the states for ranks in the first addend of Obser- 
vation [I] and hence obtain recurrence relations for comp, over the (decreasing) ranks. 
To handle the second addend, we also need to bound the number of states whose stut- 
tering, mi, exceeds k. This is summarized by the following theorem: 


Theorem 1. Let r : init — X be a rank function, Y : X — X a squeezer and 
pa: X > {1,...,d} a partition function such that Y ~ PSp, (4 (hi, i) yi1). Let 
Ý: X x{1,...,d} —> X be a rank bounding function w.r.t. r, Y and pa. If, for some 
k > 1, the number of (hj, £;)-stuttering states that appear along any non-base initial 
trace is bounded by a constant b; € N whenever i € Eg, then 


d 
comp,,(p) < XO k- comp, (¥(p,4)) + YO bi- (hi — Gk). (6) 


Note that a state may be (h;, @;)-stuttering for several i’s, in which case, it is sound 
to count it towards any of the 6;’s; in particular, we choose the one that minimizes 
hi — li'k. 


Corollary 1. Under the premises of Theorem|]] if f : X > NU {oo} satisfies f (p) = 
Sik -f (p,i)) + ick, bi - (hi — Li  k) for every p € X — B, and comp,(p) < 
f(p) for every p € B, then comp,(p) < f(p) for every p € X. We conclude that 
comp, (oo) < f(r(ao)) for every oo € init. 


Base-case complexity In order to apply Cor. [i] we need to accompany Eq. (6) with a 
bound on comp, (p) for the base ranks, p € B. Fortunately, this is usually a significantly 
easier task. In particular, the running time of the base cases is often constant, because 
intuitively, the following are correlated: (a) the rank, (b) the size of the underlying data 
structure, and (c) the number of iterations. In this case, symbolic execution may be 
used to obtain bounds for base cases (as we do in our work). In essence, any method 
that can yield a closed-form expression for the complexity of the base cases is viable. 
In particular, we can apply our technique on the base case as a subproblem. 


3.4 Establishing the requirements of the recurrence relations extraction 


Theorem [I] defines a recurrence relation from which an upper bound on the complex- 
ity function, comp,, can be computed (Cor. p. However, to ensure correctness, the 
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premises of Theorem|| |must be verified. The requirement that Y ~ PS,,({(hi, 4i) 1) 
(see Definition [8) may be verified locally by examining individual (reachable) states: 
for any (reachable) state o, the check for (hj, ;)-stuttering and switch states can, and 
should, be done in tandem, and require only observing at most max, h; transition steps 
from o and max; ¢; from Y (c). In contrast, the property required of Y is global: it re- 
quires Ý (p, i) to provide an upper bound on the rank of any squeezed switch state that 
may occur in any position along any non-base initial trace whose initial state has rank 
p. Similarly, the property required of the bounds b; is also global: that the number of 
(hi, €;)-stuttering states along any non-base initial trace is at most b;. It is therefore not 
clear how these requirements may be verified in general. We overcome this difficulty 
by imposing additional restrictions, as we discuss next. 


Establishing bounds on the number of occurrences of stuttering states Bounds on 
the number of occurrences per trace that are sound for every trace are difficult to obtain 
in general. While clever analysis methods exist that can do this kind of accounting, we 
found that a stronger, simpler condition applies in many cases: 


— For every o € reach, either: 7 
e cis (hj, £;)-stuttering with hi <k; or 
e cis (hj, li)-stuttering (with hi > k), and either ø is a switch state or tr”: (o) 
is a last state. 


This restricts these cases to occur only at the beginnings and ends of segments. It 
implies a total bound of 2d- max;(h; — ¢;-k) on the “surplus” of any trace, therefore, 
we substitute this expression for the rightmost sum in Eq. (6). 


Validating a rank bounding function The definition of a rank bounding function 
(Definition|9) encapsulates two parts. Part (ii) ensures that the rank decreases: Ý (p, i) < 
p for every p € X — B. Verifying that this requirement holds does not involve any 
reasoning about the states, nor traces, of the transition system. Part (i) ensures that Y 
provides an upper bound on the rank of squeezed switch states. Formally, it requires 
that r(Y (0s)) < Y(r(a0), i) for every switch state os in segment i € {1,...,d} along 
a trace that starts from a non-base initial state og. Namely, it relates the rank of the 
squeezed switch state, Y (os), to the rank of the initial state, 09, where no bound on the 
length of the trace between the initial state og and the switch state os is known a priori. 
As such, it involves global reasoning about traces. We identify two cases in which such 
reasoning may be avoided: (i) The partition pq consists of a single segment (i.e., d = 1); 
or (ii) The rank function extends to any state (and not just the initial states), while being 
preserved by tr. In both of these cases, we are able to verify the correctness of Ý locally. 


A single segment. In this case, the only switch state along a trace is the initial state, and 
hence the upper-bound requirement of Y boils down to the requirement that for every 
go € init such that r(ao) € X — B, we have that r(Y(o0)) < Y(r(a0), 1). 


Lemma 1. Let r, Y and pı : X — {1} such that Y ~ PS,,. Then Y : X x {1} > 
X satisfies the upper-bound requirement of a rank bounding function if and only if 
r(¥(a0)) < Y(r(a0), 1) for every oo € init such that r(a9) € X — B. 
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Rank preservation. Another case in which the upper-bound property of Ý may be ver- 
ified locally is when the r can be extended to all states while being preserved by tr: 


Definition 10. A function f : X — X extends the rank function r : init > X if? 
agrees with r on the initial states, i.e., (09) = r(a0) for every initial state og € init. 
The extended rank function f is preserved by tr, if for every reachable state o, we have 
that f(tr(c)) = F(o). 


Preservation of f by tr ensures that all states along a (reachable) trace share the 
same rank. In particular, for a reachable switch state o, that lies along 7(a0), rank 
preservation ensures that (os) = (oo) = r(o0) (the last equality is due to the exten- 
sion property), allowing us to recover the rank of do from the rank of os. Therefore, the 
upper-bound requirement of Y simplifies into the local requirement that for every reach- 
able switch state o, such that 7(o,) € X — B, we have that F(Y(o,)) < Y(F(as), i), 
for every i € {1,..., d}. 


Lemma 2. Let r, Y and pa : X —> {1,...,d} such that Y ~ PS,,. Suppose that 
f : X — X extends r and is preserved by tr. Then Y : X x {1,...,d} > X satisfies 
the upper-bound requirement of a rank bounding function if and only if f(V(os)) < 
Y (#(o5), i) for every reachable switch state os such that f(a0,) € X — B and for every 


iE {1,...,d}. 


Remark 1. The notion of a partitioned simulation requires a switch state o, to be 

squeezed into an initial state. This requirement may be relaxed into the requirement that 
ds is squeezed into a reachable state Y (os), provided that we are able to still ensure 
that the rank of (some) initial state of leading to Y (øs) is smaller than the rank of the 
trace on which ø, lies, and that the rank of of is properly captured by Y. One case in 
which this is possible, is when r is extended to ° that is preserved by tr, as in this case 


#(Y(os)) = #(90) = r(00). 


This subsection described local properties that ensure that a given program satisfies 
the requirements of Theorem|{I] The locality of the properties facilitates the use of SMT 
solvers to perform these checks automatically. This is a key step for effective application 
of the method. 


3.5 Trace-length vs. state-size recurrences with squeezers 


A plethora of work exists for analyzing the complexity of programs (see Section |6}for a 
discussion of related works). Most existing techniques for automatic complexity anal- 
ysis aim to find a recurrence relation on the length of the execution trace, relating the 
length of a trace from some state to the length of the remaining trace starting at its 
successor. These are recurrences on time, if you will, whereas our approach generates 
recurrences on the state size (captured by the rank). Is our approach completely orthog- 
onal to preceding methods? Not quite. It turns out that from a conceptual point of view, 
our approach can formulate a recurrence on time as well, as we demonstrate in this 
section. 
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Obtaining trace-length recurrences based on state squeezers The key idea is to use 
tr itself as a squeezer that squeezes each state into its immediate successor. Putting 
aside the initial-anchor requirement momentarily, such a squeezer forms a partitioned 
simulation with a single segment (i.e., pq = 1), in which all the states along a trace are 
(1, 1)-stuttering, except for the last one (if the trace is finite), which is (2, 1)-stuttering. 
Recall that squeezers must also preserve initial states (see Definition (8). a property that 
may be violated when Y = tr, as the successor of an initial state is not necessarily an 
initial state. We restore the initial-anchor property by setting init = X , i.e., every state 
is considered an initial stat] 

A consequence of this definition is that comp, will now provide an upper bound 
on the time complexity of every state and not only of the initial states, in terms of a 
rank that needs to be defined. If we further define a rank-bounding function Y we may 
extract a recurrence relation of the form 


comp,(p) = comp, (¥(p)) + 1 


(we use Ÿ (p) as an abbreviation of Y(p, 1), since this is a special case where d = 1). 


Defining the rank and the rank bounding function Recall that the rank r : X > 
X captures the features of the (initial) states that determine the complexity. To allow 
maximal precision, especially since all states are now initial, we set X to be the set 
of states X, and define r to be the identity function, r(o) = ø. With this definition, 
comp, and comp, become one. Next, we need to define < and B, while ensuring that Y 
squeezes the (non-base) initial states, which are now all the states, into states of a lower 
rank according to <. Since squeezers act like transitions now, having that Y = tr, they 
have the effect of decreasing the number of transitions remaining to reach a terminal 
state (provided that the trace is finite). We use this observation to define < C X x X. 
Care is needed to ensure that (X, <) is well-founded, i.e., every descending chain is 
finite, even though the program may not terminate. Here is the definition that achieves 
this goal: 

01 ~ 02 © comp,(o1) < comps(a2) (7) 


Since Y = tr does not decrease comp, for states that belong to infinite (non- 
terminating) traces (comp,(Y(o)) = comp, (o) = co, hence Y(o) £ o), they must be 
included in B, together with the terminal states, which are minimal w.r.t. <. Namely, 
B= FU {o | comp,(c) = oo}. Technically, this means that the base of the recurrence 
needs to define comp, for these states. 

The final piece in the puzzle is setting Y = tr. Since Y ~ PSp,({(1, 1), (2, 1)}) 
(when init = X), where the number of (2, 1)-stuttering states that appear along any 


non-base initial trace is bounded by 1, we may use Theorem||| setting k = 1, to derive 
the following recurrence relation, which reflects induction over time: 


comp, (a) = comp,(tr(a)) +1. 


5 In fact, it suffices to consider init = reach, in which case we may be able to take advantage 
of information from static analyses 
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The formulation above represents a degenerate, naive, choice of ingredients for the 
sake of a theoretical construction, whose purpose is to lay the foundation for a general 
framework that takes its strengths from both induction over time and induction over 
rank. This construction does not exploit the full flexibility of our framework. In partic- 
ular, ranking functions obtained from termination proofs, as used in [5], may be used to 
augment the rank in this setting. Further, invariants inferred from static analysis can be 
used to refine the recurrences. 


4 Synthesis 


So far we have assumed that the rank function r, partition function pg, squeezer Y 
and a rank bounding function Y are all readily available. Clearly, they are specific to 
a given program. It would be too tedious for a programmer to provide these functions 
for the analysis of the underlying complexity. In this section we show how to automate 
the process of obtaining (r, pa, Y, Y) for a class of typical looping programs. We take 
advantage of the fact that these components are much more compact than other kinds 
of auxiliary functions commonly used for resource analysis, such as monotonically de- 
creasing measures used as ranking functions. For example, a ranking function for the 
binary counter program shown in Fig. plis: 


m(n, i,c) = n- YY- elj] +(Ž—1)+(n— i) 
j=0 


whereas the rank, partition, Y and Y are 


r(n,i,c) =n Y(n,i,c) = (n-1,(i>n)?i—1:i cln -— 1]) 
¥(p)=p-1 pami) = (i> n||n—1)) 22:1 

This enables the use of a relatively naive enumerative approach of multi-phase generate- 
and-test, employing some early pruning to discard obviously non-qualifying candidates. 


4.1 SyGuS 


The generation step of the synthesis loop applies syntax guided synthesis (SyGuS [[7]). 
Like any other SyGuS method, defining the underlying grammars is more art than sci- 
ence. It should be expressive enough to capture the desired terms, but strict enough to 
effectively bound the search space. 

Ranks are taken from N™ where m € {1,2,3} and ~ is the usual lexicographic 
order. The rank function r comprises of one expression for each coordinate, constructed 
by adding / subtracting integer variables and array sizes. Boolean variables are not used 
in rank expressions. 

Partition functions pg. Our implementation currently supports a maximum number 
of two segments. This means that the partition function only assigns the values 1 and 2, 
and we synthesize it by generating a condition over the program’s variables, cond, that 
selects between them: pa(a) = cond(c) ? 2: 1. Handling up to two segments is not an 
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inherent limitation, but we found that for typically occurring programs, two segments 
are sufficient. 

Squeezers Y are the only ingredient that requires substantial synthesis effort. We 
represent squeezers as small loop-free imperative programs, which are natural for rep- 
resenting state transformations. We use a rather standard syntax with ‘if-then-else’ and 
assignments, plus a remove-adjust operation that removes array entries and adjusts in- 
dices relating to them accordingly. . 

Rank bounding functions Y. With a well-chosen squeezer Y, it suffices to consider 
quite simple rank bounds for the mini-traces. Hence, the rank-bounds defined by Y are 
obtained by adding, subtracting and multiplying variables with small constants (for each 
coordinate of the rank). Similar to the choice of ranks, targeting simple expressions for 
Y helps reduce the complexity of the final recurrence that is generated from the process. 


4.2 Verification 


For the sake of verifying the synthesized ingredients, we fix a set {h;, @;} of stutter- 
ing shapes, and check the requirements of Theorem [I] as discussed in Section [3.4] In 
particular, we check that pg is weakly monotone, i.e., that cond cannot change from 
true to false in any step of tr. Note that some of the properties may be used to discrim- 
inate some of the ingredients independent of the others. For example, the simulation 
requirement only depends on Y and pg. 


Unbounded verification Once candidates pass a preliminary screening phase, they are 
verified by encoding the program and all the components r, pa, Y, Y as first-order logic 
expressions, and using an SMT solver (Z3 [I3]) to verify that the requirements are 
fulfilled for all traces of the program. 

As mentioned in Section 3.4] all the checks are local and require observing a bounded 
set of steps starting from a given o. The only facet of the criteria that is difficult to 
encode is the fact they are required of the reachable states (and not any state). Of course, 
if we are able to ascertain that these are met for all 0 € 2’, including unreachable 
states, then the result is sound. However, for some programs and squeezers, the required 
properties (esp., simulation) do not hold universally, but are violated by unreachable 
states. To cope with this situation without having to manually provide invariants that 
capture properties of the reachable states, we use a CHC solver, Spacer [23], which 
is part of Z3, to check whether all the reachable states in the unbounded-state system 
induced by the input program satisfy these properties. This can be seen as a reduction 
from the problem of verifying the premises of Theorem [I]to that of verifying a safety 


property. 


5 Empirical Evaluation 


We implemented our complexity analyzer as a publicly available tool, SqzComp, that 
receives a program in a subset of C and produces recurrence relations. SqzComp is 
written in C++, using the Z3 C++ API {13}, and using Spacer via its SMTLIB2- 
compatible interface. Since our squeezers may remove elements from arrays, we ini- 
tially encoded arrays as SMT sequences. However, we found that it is beneficial to 
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Description Real Inferred bound SqzComp 

complexity CoFloCo SqzComp Time | d 
array: max value O(|A]) O(|A]) O(|A]) < Isec| 1 
array: min value O(|A]) O(|A]) O(|A]) <I1sec) 1 
array: find first O(|A]) O(|A]) O(|A]) <1sec} 1 
array: find last O(|A]) O(|A]) O(|A]) <I sec} 1 
array: is-sorted O(|A]) O(|A]) O(|A]) <1sec} 1 
array: longest asc. prefix O(|A]) O(|A]) O(|A]) <1sec) 1 
array: binary search O(log(|A])) O(log(|A])) O(log(|A])) [< Isec] 1 
gcd max(zx, y) O(a +y) O(a +y) < Isec| 1 
two-phase loop 1 O(2n — 2a + y) O(2n — 2x + y) O(2n+2y) |< 1sec| 1 
two-phase loop 2 O(n—=xr+m-— y)|O(n—x+m-— y)|O(n—-xzr+m -—y)|< 1sec| 1 
two-phase loop 3 O(n) O(n) O(n) < Isecj| 1 
two-phase loop 4 O(2n — x — z) O(2n — z — z) O(2n) <1 sec) 1 
multi-path loop 1 O(n) O(3n) O(n) <1sec} 1 
multi-path loop 2 O(n) O(n) O(n) <1 sec} 1 
multi-path loop 3 O(n) O(n) O(n) <1sec} 1 
tricky init loop O(z) O(z) O(z) 4min | 1 
nested loop 1 O(\a — yl) O(|x — y|) O(a +y) < sec} 1 
nested loop 2 O(a?) O(a?) O(a?) 16 min| 1 
context sensitive loop |O(max(n — m,m))|O(max(n — m,m)) O(n) 7 min | 1 
binary counter O(2"*7) oo O"*7) 34 min} 2 
subsets o) co o((";"")) 50 min} 2 
monotone sequences oG) o0 OD) 50 min| 2 


Table 1. Experimental results. In array programs, A denotes an array. x, y, z, n, m, k, a are inte- 
ger variables. 


restrict squeezers to only remove the first or last elements of an array, resulting in a 
more efficient encoding with the theory of arrays. For the base case of generated recur- 
rences, we use the symbolic execution engine KLEE to bound the total number of 
iterations by a constant. 


5.1 Experiments 


We evaluated our tool, SqzComp, on a variety of benchmark programs taken from [16], 
as well as three additional programs: the binary counter example from Section |2| a 
subsets example, described in Section [5.2] and an example computing monotone se- 
quences. These examples exhibit intricate time complexities. From the benchmark suite 
of we filtered out non-deterministic programs, as well as programs that failed syn- 
tactic constraints that our frontend cannot currently handle. We compared SqzComp to 
CoFloCo [16]—the state of the art tool for complexity analysis of imperative programs. 

Table [I] summarizes the results of our experiments. The first column presents the 
name of the program, which describes its characteristics (each of the “two-phase loop” 
programs consists of a loop with an if statement, where the branch executed changes 
starting from some iteration). The second column specifies the real complexity, while 
the following two columns present the bounds inferred by SqzComp and by CoFloCo, 
respectively. (For SqzComp, the reported bounds are the solutions of the recurrences 
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1 | void subsets(uint n, uint k, uint m) { 
2 uint I[k]; int j = 0; bool f = true; 
3 while (j >= 0) { 
4 if (j >= k) /*start left scan*/{f=false; j--;} 
5 else if (j==0 && f) /*init*/{f=true;1I[0]=m; j++; } 
6 else if (f) /*right fill*/{f=true;I[j]=I[j-1]+1; j++;} 
else if (I[j]>=n-k+j)/*left scan*/{f=false; j--;} 
8 else /*start right fill*/{f=true; I[j]=I[j]+1;j++;} 
9 | H 
squeezer(uint I[], uint n, uint k, uint m, int j, bool f) { 
if (I[O0]==m && j>0) { m++; remove I[0]; k--; j--; } 
else if (I[0]==m) { m++; remove I[0]; k--; } 
else { m+; } 
} 
Fig. 4. An example program that produces all subsets of {m, ..., n — 1} of size k; below is the 


synthesized squeezer. 


output by the tool.) The fourth and fifth columns present the analysis running time, 
respectively the number of segments used in the analysis, of SqzComp. 

CoFloCo’s analysis time is always in the order of magnitude of 0.1 second, whether 
it succeeds to find a complexity bound or not. Our analysis is considerably slower, 
mostly due to the naive implementation of the synthesizer. When both CoFloCo and 
SqzComp succeed, the bounds inferred by CoFloCo are sometimes tighter. 

However, SqzComp manages to find tight complexity bounds for the new examples, 
which are not solved by CoFloCo, and to the best of our knowledge, are beyond reach 
of existing tools. (We also encoded the new examples as OCaml programs and ran the 
tool of on them, and it failed to infer bounds.) 


5.2 Case study: Subsets example 


This subsection presents one challenging example from our benchmarks, the subsets 
example, and the details of its complexity analysis. Notably, our method is able to infer 
a binomial bound, which is asymptotically tight. 

The code, shown in Fig. |4| iterates over all the subsets of {m,...,n-1} of size k. 
The “current” subset is maintained in an array I whose length is k, and which is always 
sorted, thus avoiding generating the same set more than once. The first k iterations of 
the loop fill the array with values {m,m+1,...,m+k-1}, which represent the first subset 
generated. This is taken care of by the branches at lines [5 [6|that perform a “right fill” 
phase, filling in the array with an ascending sequence starting from m at I[0]. Once the 
first k iterations are done, j reaches the end of the array (j=k) and so the next iteration 
will execute line|4] turning off the flag f, signifying that the array should now be scanned 
leftwards. In each successive iteration, j is decreased, looking for the rightmost element 
that can be incremented. For example, if n = 8, J = [2, 6, 7], this rightmost element is 
I[0] = 2. After that element is incremented, the flag f is turned on again, completing 
the “left scan” phase and starting a “right fill” phase. 
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n=6, m=2, k=3 


j=0 fet j=l fa fa j=2 j=l fet j0 fat jel fa j0 f= jel fet 


AA AA Y 
X|X|X 2|X|x 2|3|X 2|4]|5 2|4]|ë 3B/4]5 3/4] 5 3/4] 5 
tr tr tr tr tr 
00 > OL > 02 > > Og > 09 > 010 a » 015 > O16 
1 
1 
1 
yY i 
1 
Y M Y 
r $ 1 1 n n n" 
T0 moO: ae > Og T0 ao e * O15 > 016 
tr tr tr tr 
X|X 3/X 4} 5 2/4/15 3/4] 5 3/4] 5 3/4] 5 
F ¥ F ¥ F 4 


j0 fa jel j0 j0 jel j0 jel 

n=6, m=3, k=2 fet fot n=6, m=3, k=3 fa f= f=f 
Fig. 5. An illustration of the 2-partitioned simulation for the subsets example. In the univariate 
case, the rank of the upper trace is n — m and that of the lower traces is n — m — 1. In the 


multivariate case, the upper trace is of rank (n — m, k), lower traces of ranks (n —m—1,k—1), 
(n—m-—1,k). 


A univariate recurrence Consider the rank function r(I,n,k,m,j, f) = n — m, de- 
fined with respect to (N, <), and the squeezer shown below the program in Fig. 4] The 
squeezer observes the first element of the array: if it is equal to m (the lower bound of 
the range), it removes it from the array, shrinking its size (k) by one. It then adjusts the 
index j to keep pointing to the same element; unless 7 = 0, in which case that element is 
removed. This squeezer forms a 2-partitioned simulation, as illustrated by the traces in 
Fig.[5] All states are (1, 1)-stuttering, except for 79, which is (2, 1)-stuttering, as caused 
by the removal of 7[0] when j = 0. The rank bounding function is Y (i, p) = p — 1 for 
i € {1,2}. We therefore obtain the following recurrence relation: 


comp,(p) < 1+ comp,(p— 1) + comp, (p— 1). 


The base of the recurrence is comp,,(0) = 1, leading to the solution comp,(p) < 
2P+1 _ 1, This means that for an initial state, comp,(I,n,k,m,0, true) < comp,.(n — 
m) < PMH _ 1, 


A multivariate recurrence Consider an alternative rank definition r(I,n, k, m, j, f) = 
(n — m, k) defined with respect to (N x N, <), where ‘<’ denotes the lexicographic 
order, together with the same squeezer and partition as before. The rank bounding func- 


tion is now Y ((p1, p2), 4) = te —l,p2-1)i=1 


i . The corresponding recurrence 
(Pı—1,p2) i=2 i 5 
relation is: 


comp, (p1, P2) < 1+ comp, (pi — 1, p2 — 1) + comp,(p1 — 1, p2) 


with base comp, (0,_) = 1, resulting in the solution comp, (p1, p2) < (ue) That is, 
. PR: ae 2 
for an initial state, comp, (1, n, k, m, 0, true) < comp,(n—m,k) < (e u ). 
Interestingly, this example demonstrates that the same squeezer may yield different 
recurrences, when different ranks (and rank bounding functions) are considered. It also 


demonstrates a case where different segments of a trace are mapped to mini-traces of a 
different rank. 


Run-time Complexity Bounds Using Squeezers 343 


6 Related Work 


This section focuses on exploring existing methods for static complexity analysis of 
imperative programs. Dynamic profiling and analysis are a separate research area, 
more related to testing, and generally do not provide formal guarantees. We further 
focus on works that determine asymptotic complexity bounds, and use the number of 
iterations executed as their cost model; we refrain from thoroughly covering previous 
techniques that analyze complexity at the instruction level. 


Static cost analysis The seminal work of defined a two steps meta-framework 
where recurrence relations are extracted from the underlying program, and then an- 
alyzed to provide closed-form upper bounds. Broadly speaking, cost relations are a 
generalized framework that captures the essence of most of the works mentioned in this 
section. 

and infer cost relations of imperative programs written in Java and C re- 
spectively. Cost relations resemble somewhat limited C procedures: They are capable of 
recursive calls to other cost relations, and they can handle non-determinism that arises 
either as a consequence of direct nondet ( ) in the program, or as a result of inherent 
imprecision of static analysis. They define for every basic block of the program its own 
cost relation function, and then form chains according to the control flow graph of the 
program. They use numerical abstract domains to support a context sensitive analysis 
of whether a chain of visits to specific basic blocks is feasible or not. Once all infeasi- 
ble chains are removed, disjunctive analysis determines an overall approximation of the 
heaviest chain, representing the max number of iterations. 

19] uses multiple counter instrumentation that are automatically inserted in various 
points in the code, initialized and incremented. These ghost counters enable to infer an 
overall complexity bound by applying appropriate abstract interpretation handling nu- 
meric domains. [18] and apply code transformations to represent multi-path loops 
and nested loops in a canonical way. Then, paths connecting pairs of “interesting” code 
points 71,72 (loop headers etc.) are identified, in a way that satisfies some proper- 
ties. For instance, 71 is reached twice without reaching 72. The path property induces 
progress invariants, which are then analyzed to infer the overall complexity bound. 

define an abstraction of the program to a size-change-graph, where transition 
edges of the control flow graph are annotated to capture sound over-approximation re- 
lations between integer variables. The graph is then searched for infinitely decreasing 
sequences, represented as words in an w-regular language. This representation concisely 
characterizes program termination. then harnesses the size-change abstraction from 
to analyze the complexity of imperative programs. First, they apply standard pro- 
gram transformations like pathwise analysis to summarize inner nested loops. Then, 
they heuristically define a set of scalar rank functions they call norms. These norms 
are somewhat similar to our rank function in the sense that they help to abstract away 
program parts that do not effect its complexity. The program is then represented as a 
size-change graph, and multi-path contextualization prunes subsequent transitions 
which are infeasible. 

introduces difference constraints in the context of termination, to bound vari- 
ables x’ in current iteration with some y in previous iteration plus some constant c: 
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r Lyte extends difference constraints to complexity analysis. Indeed, it is 
quite often the case that ideas from the area of program termination are assimilated in 
the context of complexity analysis and vice versa. They exploit the observation that 
typical operations on loop counters like increment, decrement and resets are essentially 
expressible as difference constraints. They design an abstraction based on the domain of 
difference constraints, and obtain relevant invariants which are then used in determin- 
ing upper bounds. [10] is very similar, only that it represents a program as an integer 
transition system and allows nonlinear numerical constraints and ranking functions. 

As we mentioned earlier, all of these approaches are based on identifying the progress 
of executions over time, characterizing the progress between two given points in the 
program. In contrast, our approach allows to reason over state size and compares whole 
executions. 


Squeezers. The notion of squeezers was introduced by for the sake of safety veri- 
fication. As discussed in Section [I] the challenges in complexity analysis are different, 
and require additional ingredients beyond squeezers. [15]1|2] introduce well structured 
transition systems, where a well-quasi order (wqo) on the set of states induces a simu- 
lation relation. This property ensures decidability of safety verification of such systems 
(via a backward reachability algorithm). Our use of squeezers that decrease the rank 
of a state and induce a sort of a simulation relation may resemble the wqo of a well 
structured transition system. However, there are several key differences: we do not re- 
quire the order (which is defined on ranks) to be a wqo. Further, we do not require a 
simulation relation between any states whose ranks are ordered, only between a state 
and its squeezed counterpart. Notably, our work considers complexity analysis rather 
than safety verification. 


7 Conclusion 


This work introduces a novel framework for run-time complexity analysis. The frame- 
work supports derivation of recurrence relations based on inductive reasoning, where 
the form of induction depends on the choice of a squeezer (and rank bounding func- 
tion). The new approach thus offers more flexibility than the classical methods where 
induction is coupled with the time dimension. For example, when the rank captures the 
“state size”, the approach mimics induction over the space dimension, reasoning about 
whole traces, and alleviating the need to describe the intricate development of states 
over time. We demonstrate that such squeezers and rank bounding functions, which we 
manage to synthesize automatically, facilitate complexity analysis for programs that are 
beyond reach for existing methods. Thanks to the simplicity and compactness of these 
ingredients, even a rather naive enumeration was able to find them efficiently. 
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Abstract. We consider a hierarchy of four typed call-by-value languages 
with either higher-order or ground-type references and with either call/cc 
or no control operator. 

Our first result is a fully abstract trace model for the most expressive 
setting, featuring both higher-order references and call/cc, constructed 
in the spirit of operational game semantics. Next we examine the impact 
of suppressing higher-order references and callcc in contexts and provide 
an operational explanation for the game-semantic conditions known as 
visibility and bracketing respectively. This allows us to refine the original 
model to provide fully abstract trace models of interaction with contexts 
that need not use higher-order references or call/cc. Along the way, we 
discuss the relationship between error- and termination-based contextual 
testing in each case, and relate the two to trace and complete trace 
equivalence respectively. 

Overall, the paper provides a systematic development of operational 
game semantics for all four cases, which represent the state-based face 
of the so-called semantic cube. 


Keywords: contextual equivalence, operational game semantics, higher- 
order references, control operators 


1 Introduction 


Research into contextual equivalence has a long tradition in programming lan- 
guage theory, due to its fundamental nature and applicability to numerous veri- 
fication tasks, such as the correctness of compiler optimisations. Capturing con- 
textual equivalence mathematically, i.e. the full abstraction problem [26], has 
been an important driving force in denotational semantics, which led, among 
others, to the development of game semantics [2,12]. Game semantics models 
computation through sequences of question- and answer-moves by two players, 
traditionally called O and P, who play the role of the context and the program 
respectively. Because of its interactive nature, it has often been referred to as a 
middle ground between denotational and operational semantics. 


* The full version is available at https: //hal.archives-ouvertes.fr/hal-03116698. 
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Over the last three decades the game-semantic approach has led to numerous 
fully abstract models for a whole spectrum of programming paradigms. Most pa- 
pers in this strand follow a rather abstract pattern when presenting the models, 
emphasing structure and compositionality, often developing a correspondence 
with a categorical framework along the way to facilitate proofs. The operational 
intuitions behind the games are somewhat obscured in this presentation, and 
left to be discovered through a deeper exploration of proofs. 

In contrast, operational game semantics aims to define models in which the 
interaction between the term and the environment is described through a care- 
fully instrumented labelled transition system (LTS), built using the syntax and 
operational semantics of the relevant language. Here, the derived trace seman- 
tics can be shown to be fully abstract. In this line of work, the dynamics is 
described more directly and provides operational intuitions about the meaning 
of moves, while not immediately giving structural insights about the structure 
of the traces. 

In this paper, we follow the operational approach and present a whole hier- 
archy of trace models for higher-order languages with varying access to higher- 
order state and control. As a vehicle for our study, we use HOSC, a call-by-value 
higher-order language equipped with general references and continuations. We 
also consider its sublanguages GOSC, HOS and GOS, obtained respectively by 
restricting storage to ground values, by removing continuations, and by imposing 
both restrictions. We study contextual testing of a class of HOSC terms using 
contexts from each of the languages x € {HOSC, GOSC, HOS, GOS}; we write x 
to refer to each case. Our working notion of convergence will be error reachabil- 
ity, where an error is represented by a free variable. Accordingly, at the technical 
level, we will study a family of equivalence relations =%,.., each corresponding to 
contextual testing with contexts from x, where contexts have the extra power 
to abort the computation. 

Our main results are trace models Tr,(I’ + M) for each x € {HOSC, GOSC, 
HOS, GOS}, which capture S% through trace equivalence: 


“err 


T H M; &*,, Mp if and only if Trx(I H M1) = Try( + My). 


“err 


It turns out that, for contexts with control (i.e. x € {HOSC, GOSC}), S% coin- 
cides with the standard notion of contextual equivalence based on termination, 
written X% ,„. However, in the other two cases, the former is strictly more dis- 
criminating than the latter. We explain how to account for this difference in the 
trace-based setting, using complete traces. 

A common theme that has emerged in game semantics is the comparative 
study of the power of contexts, as it turned out possible to identify combina- 
torial conditions, namely visibility [3] and bracketing [22], that correspond to 
contextual testing in the absence of general references and control constructs 
respectively. In brief, visibility states that not all moves can be played, but only 
those that are enabled by a “visible part” of the interaction, which could be 
thought of as functions currently in scope. Bracketing in turn imposes a disci- 
pline on answers, requiring that the topmost question be answered first. In the 
paper, we provide an operational reconstruction of both conditions. 
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o,7 Ê Unit | Int | Bool | refr | T x o | T — ø | cont T 

U,V £()| tt | | a) 2] | U,V) | Av7.M | rec y(x7).M | cont, K 

M, NÊ V | (M,N) | m:M | MN | ref, M | !M | M :=N | if Mı Mo M; | MEN 
MEON | M =N | call/cc,(x.M) | throw, M to N 

K £e]|(V,K) |(K,M)| mK |VK | KM |ref,K|!K|V:=K|K:=M 
ifK MN|K@M|V@K|KOM|VOK|K=M|V=K 
throw, V to K | throw, K to M 

C 2e| (M,C) | (C,M) | mC | Ax7.C | rec y(a7).C | MC | CM | ref, C | !C 
C:=M|M:=Cl|ifCMN|ifMCN|if MNC|C@M|MeC 
COM|MUC|C=M|M=C| call/cc,(#.C) | throw, C to M 
throw, M to C 


Notational conventions: x,y € Var, €€ Loc, n EZ, i€ {1,2}, R E{+,—,*}, 

‘Je {=, <} 

Syntactic sugar: let x = Min N stands for (Av.N)M (if x does not occur in N we also 
write M; N) 


Fig. 1. HOSC syntax 


Overall, we propose a unifying framework for studying higher-order languages 
with state and control, which we hope will make the techniques of (operational) 
game semantics clearer to the wider community. The construction of the fully 
abstract LTSs is by no means automatic, as there is no general methodology for 
extracting trace semantics from game models. Some attempts in that direction 
have been reported in [25], but the type discipline discussed there is far too weak 
to be applied to the languages we study. As the most immediate precursor to our 
work, we see the trace model of contextual interactions between HOS contexts 
and HOS terms from [23]. In comparison, the models developed in this paper 
are more general, as they consider the interaction between HOSC terms and 
contexts drawn from any of the four languages ranged over by x. 

In the 1990s, Abramsky proposed a research programme, originally called 
the semantic cube [1], which concerned investigating extensions of the purely 
functional programming language PCF along various axes. From this angle, the 
present paper is an operational study of a semantic diamond of languages with 
state, with GOS at the bottom, extending towards HOSC at the top, either via 
GOSC or HOS. 


2 HOSC 


The main objects of our study will be the language HOSC along with its frag- 
ments GOSC, HOS and GOS. HOSC is a higher-order programming language 
equipped with general references and continuations. 


Syntax HOSC syntax is given in Figure 1. Assuming countably infinite sets 
Loc (locations) and Var (variables), HOSC typing judgments take the form 
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(K[(Av?.M)V],h) —>(K[M{V/x}], h) (K[!4, h) —(K[h(0)], h) 

(K[mi(Mi, V2)],h) —>(K[V:], h) (K[ref V], h) >(K[Q,h- => V)) 

(Klif tt Mı Mə], h) >(K[M1ı], h) (K[é := V], h)>(K[0], kE = V) 

(Klif Œ Mı Mo], h) 3(K[M3], h) (KU = U], h) > (Kb), h) 

(KRB M], h) >(K[n® m], h) with b = tt if 2 = 0’, otherwise b = ff 
(KIRO M], h) —(K [Bb], h) (K[(rec y(x°).M)V], h) 

with b = tt if n O m, otherwise b = ff ~y (K[M{V/2,U/y}], b) 
(K[call/cc(x.M)], h)>(K[M {cont K/x}], h)| (K[throw V to cont K], h) > (K'[V], h) 


Fig. 2. Operational reduction for HOSC 


X; M:7, where X and T are finite partial functions that assign types to 
locations and variables respectively. In typing judgements, we often write X as 
shorthand for X; Ø (closed) and I’ as shorthand for @; I (location-free). Similarly, 
+ M:r means §;0+ M:r. 


Operational semantics A heap h is a finite type-respecting map from Loc to 
values. We write h : (X; r), if dom(2’) C dom(h) and X; r F A(é) : o for 
(£o) € X, The operational semantics of HOSC reduces pairs (M,h), where 
X; rH M:7 andh:(;I). The rules are given in Figure 2, where {-} denotes 
(capture-avoiding) substitution. We write (M,R) lier if there exist V,h’ such 
that (M,h) >* (V,h’) and V is a value. 

We distinguish the following fragments of HOSC. 


Definition 1. — GOSC types are HOSC types except that reference types are 
restricted to refu, where ı is given by the grammar ı = Unit | Int | Bool | refz. 
GOSC terms are HOSC terms whose typing derivations (i.e. not only the 
final typing judgments) rely on GOSC types only. GOSC is a superset of 
FOSC [8] (GOSC also includes references to references - the reft case above). 

— HOS types are HOSC types that do not feature the cont constructor. HOS 
terms are HOSC terms whose typing derivations rely on HOS types only. 
Consequently, HOS terms never have subterms of the form call/cc,(«.M), 
throw, M to N or cont, K. 

— GOS is the intersection of HOS and GOSC, both for types and terms, i.e. 
there are no continuations and storage is restricted to values of type ı, defined 
above. 


Definition 2. Given a HOSC term | M:7, we refer to types in I and T as 
boundary types. Let x € {HOSC, GOSC, HOS, GOS}. We say that a HOSC 
term I’ M:7 has an x boundary if all of its boundary types are from x. 


Remark 1. Note that typing derivations of HOSC terms with an x boundary may 
contain arbitrary HOSC types as long as the final typing judgment uses types 
from x only. Consequently, if x Æ HOSC, HOSC terms with an x boundary form 
a strict superset of x. 


Next we introduce several notions of contextual testing for HOSC-terms, us- 
ing various kinds of contexts. For a start, we introduce the classic notion of 
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contextual approximation based on observing termination. The notions are pa- 
rameterized by x, indicating which language is used to build the testing contexts. 
We write FF C: ror if, x:r tH Cla]: 7’, and PF C+rif PF C:7ro7' 


for some 7’. 


Definition 3 (Contextual Approximation). Let x € {HOSC, GOSC, HOS, 
GOS}. Given HOSC terms IT F Mı, Mə : T with an x boundary, we define 
[TE Mı Z% Mə to hold, when for all conterts || C+7 built from the syntax of 
x, if (C[M1], €) Veer then (C[Mo], €) Wer. 


We also consider another way of testing, based on observing whether a pro- 
gram can reach a breakpoint (error point) inside a context. Technically, the 
breakpoints are represented as occurrences of a special free error variable err : 
Unit — Unit. Reaching a breakpoint then corresponds to convergence to a stuck 
configuration of the form (K[err()], h): we write (M, R) err if there exist K, h’ 
such that (M,h) >* (K[err()],h’). 


Definition 4 (Contextual Approximation through Error). Suppose x € 
{HOSC, FOSC, HOS, GOS}. Given HOSC terms + Mı, M2: T with an x 
boundary and err g dom(I’), we define Ir į Mı <%,,. M2 to hold, when for all 


contexts err : Unit + Unit F C+ 7 built from x-syntaz, if (C[M1], €) err then 
(C[Mg], €) Perr- 


For the languages in question, it will turn out that <*. is at least as discriminat- 


merr 


ing as <*%,,. for each x € {HOSC, GOSC, HOS, GOS}, and that they coincide for 
x € {HOSC, GOSC}. We will write =*.. and S% for the associated equivalence 
relations. 

For higher-order languages with state and control, it is well known that 
contextual testing can be restricted to evaluation contexts after instantiating 
the free variables of terms to closed values (the so-called closed instances of 
use, CIU). Let us write X,I” | y : I for substitutions y such that, for any 
(t,o7) E€ T, the term q(x) is a value satisfying X; I” + y(x) : oz. Then M{y} 
stands for the outcome of applying y to M. 


Definition 5 (CIU Approximation). Let x € {HOSC, GOSC, HOS, GOS} 
and let P+ My, Mə : T be HOSC terms with an x boundary. 

-rF Mı axa) Mə : T, when for all X, h, K,y, all built from x syntax, such 
thath: 3, Sb K +7, and X Fy: T, we have (K[Mi{4}], 2) Vier implies 
(K[M2{7}], h) Vier. ; 

— We write 0 Mi asea) Mə : T, when for all X, h, K, y, all built from x 
syntax, such that h : X; e?r, X; e?r K +7, and X; efr F y:T, we have 
(K[Mi{y}], 2) Jerr implies (K[Mo{y}], h) err, where err g dom(I’) and 
err stands for err : Unit > Unit. 


Results stating that “CIU tests suffice” are referred to as CIU lemmas. A general 
framework for obtaining such results for higher-order languages with effects was 
developed in [10,33]. The results stated therein are for termination-based testing, 
i.e. ter, but adapting them to Jerr is not problematic. 
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Lemma 1 (CIU Lemma). Let x € {HOSC, GOSC, HOS, GOS} and y € {ter, 
err}. Then we have DF My S¥ Mə iff fF Mı cae) Mp. 


The preorders <%,,. will be the central object of study in the paper. Among 
NETT J 


others, we shall provide their alternative characterizations using trace seman- 
tics.The characterizations will apply to a class of terms that we call cr-free. 


Definition 6. A HOSC term I H M : T is cr-free if it does not contain occur- 
rences of conto K and locations, and its boundary types are cont- and ref-free. 


We stress that the boundary restriction applies to I’ and 7 only, and subterms 
of M may well contain arbitrary HOSC types and occurrences of refo, call/cc,, 
throw, for any o. The majority of HOSC/GOSC/HOS/GOS examples stud- 
ied in the literature, e.g. [28,4,8], are actually cr-free. We will revisit some of 
them as Examples 6, 7, 10. The fact that cr-free terms may not contain sub- 
terms cont, K or £ is not really a restriction, as cont, K and £ being more of a 
run-time construct than a feature meant to be used directly by programmers. 
Finally, we note that the boundary of a cr-free term is an x boundary for any 
x € {HOSC, GOSC, HOS, GOS}. Thus, we can consider approximation between 
cr-terms for any x from the range, i.e. the notions <*.., S%,. are all applicable. 
Consequently, cr-free terms provide a common setting in which the discrimi- 
nating power of HOSC,GOSC, HOS and GOS contexts can be compared. We 
discuss the scope for extending our results outside of the cr-free fragment, and 
for richer type systems, in Section 7. 


3 HOSC[HOSC] 


Recall that <H°°° concerns testing HOSC terms with HOSC contexts. Accord- 


ingly, we call this case HOSC[HOSC]. For cont ,(A’)-free terms, we show that 
<HOSC and <HOSC coincide, which follows from the lemma below. 


~werr ~nter 


Lemma 2. Let P+ Mı, M2 be HOSC terms not containing any occurrences of 
cont- (K). 


1. 0b Mı <%,,. Mo implies P+ Mı St, Mo, for x € {HOSC, GOSC, HOS, 
GOS}. 
2.0 Mı <%, Mz implies I H} Mı LZ Mo, for x € {HOSC, GOSC}. 


NErTrT 


In what follows, after introducing several preliminary notions, we shall design a 
labelled transition system (LTS) whose traces will turn out to capture contex- 
tual interactions involved in testing cr-free terms according to H950., This will 
enable us to capture <#O°° via trace inclusion. Actions of the LTS will refer to 


functions and continuations in a symbolic way, using typed names. 


3.1 Names and abstract values 


Definition 7. Let FNames = J, ,, FNames,_,,’ be the set of function names, 
partitioned into mutually disjoint countably infinite sets FNamesg_,g. We will 
use f,g to range over FNames and write f : o —> 0’ for f € FNames,_4o'. 
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Analogously, let CNames = (tJ, CNames, be the set of continuation names. 
We will use c,d to range over CNames, and write c: o for c € CNames,. Note 
that the constants represent continuations, so the “real” type of c is cont o, but 
we write c : o for the sake of brevity. We assume that CNames, FNames are 
disjoint and let Names = FNames CNames. Elements of Names will be weaved 
into various constructions in the paper, e.g. terms, heaps, etc. We will then write 
v(X) to refer to the set of names used in some entity X. 


Because of the shape of boundary types in cr-free terms and, in particular, the 
presence of product types, the values that will be exchanged between the context 
and the program take the form of tuples consisting of (), integers, booleans 
and functions. To describe such scenarios, we introduce the notion of abstract 
values, which are patterns that match such values. Abstract values are generated 


by the grammar 
A,B>()|tt| f|] f | (A,B) 


with the proviso that, in any abstract value, a name may occur at most once. As 
function names are intrinsically typed, we can assign types to abstract values in 
the obvious way, writing A:T. 


3.2 Actions and traces 


Our LTS will be based on four kinds of actions, listed below. Each action will be 
equipped with a polarity, which is either Player (P) or Opponent (O). P-actions 
describing interaction steps made by a tested term, while O-actions involve the 
context. 


— Player Answer (PA) c(A), where c: ø and A: ø. This action corresponds 
to the term sending an abstract value A through a continuation name c. 

— Player Question (PQ) f(A,c), where f : o > 0’, A: ø andc: 0’. Here, 
an abstract value A and a continuation name c are sent by the term through 
a function name f. 

— Opponent Answer (OA) c(A), c: o then A: ø. In this case, an abstract 
value A is received from the environment via the continuation name c. 

— Opponent Question (OQ) f(A,c), where f : o>’, A: o andc: a’. 
Finally, this action corresponds to receiving an abstract value A and a con- 
tinuation name c from the environment through a function name f. 


In what follows, a is used to range over actions. We will say that a name is 
introduced by an action a if it is sent or received in a. If a is an O-action (resp. 
P-action), we say that the name was introduced by O (resp. P). An action a is 
justified by another action a’ if the name that a uses to communicate, i.e. f in 
questions (f(A,c), f(A,c)) and c in answers (@(A), c(A)), has been introduced 
by a’. 

We will work with sequences of actions of a very special shape, specified 
below. The definition assumes two given sets of names, Np and No, which 
represent names that have already been introduced by P and O respectively. 
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Definition 8. Let No,Np C Names. An (No, Np)-trace is a sequence t of 
actions such that: 


— the actions alternate between Player and Opponent actions; 
— no name is introduced twice; 
— names from No, Np need no introduction; 
— if an action a uses a name to communicate then 
e a= f(A,c) (f € No) ora=(A) (c € No) ora=f(A,c) (f € Np) or 
a=c(A) (c€ Np) or 
e the name has been introduced by an earlier action a’ of opposite polarity. 


Note that, due to the shape of actions, a continuation name can only be intro- 
duced/justified by a question. Moreover, because names are never introduced 
twice, if a’ justifies a then a’ is uniquely determined in a given trace. Read- 
ers familiar with game semantics will recognize that traces are very similar to 
alternating justified sequences except that traces need not be started by O. 


Example 1. Let (No, Np) = ({c},@) where c : r = ((Unit —> Unit) > Unit) x 
(Unit — Int). Then the following sequence is an (No, Np)-trace: 


tı =e((g1,92)) gi(fi,er) f(O, c2) ¢2(()) a0) (0) a0) 920,63) €3(2) 


where gı : (Unit — Unit) + Unit, go : Unit > Int, fı : Unit > Unit, c1, C2 : 
Unit, c3 : Int. 


3.3 Extended syntax and reduction 


We extend the definition of HOSC presented in Figure 2 to take into account 
these names. We refine the operational reduction using continuation names to 
keep track of the toplevel continuation. We list all the changes below. 


— Function names are added to the syntax as constants. Since they are meant 
to represent values, they are also considered to be syntactic values in the 
extended language. 

f € FNames, 59: 
STE f: 


— Continuation names are not terms on their own. Instead, they are built into 
the syntax via a new construct cont, (K,c), subject to the following typing 
rule. 


X: K:0730' ce CNamesy: 
X; Ir F cont, (Kf, c) : cont o 


conto (K, c) is a staged continuation that first evaluates terms inside K and, 
if this produces a value, the value is passed to c. This operational meaning 
will be implemented through a suitable reduction rule, to be discussed next. 
cont, (K, c) is also regarded as a value. Note that we remove the old construct 
cont, K from the extended syntax. 
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— The operational semantics — underpinning the LTS is based on triples 
(M,c,h) such that X; r H M : o, cE CNames, and h : X. The continuation 
name c is used to represent the surrounding context, which is left abstract. 
The previous operational rules + are embedded into the new reduction > 
using the rule below. 

(M,h) > (M',h’) 
(M,c,h) > (M’,c,h’) 
The two reduction rules related to continuations, previously used to define 
—, are not included. Instead we use the following rules, which take advantage 
of the extended syntax. 


(K[call/cc,(a.M)], c,h) 
(K|throw, V to cont, (K’,c’)], c,h) 


—> (K[M{cont, (K,c)/x}], c,h) 
> (K'V],c,h) 


3.4 Configurations 


We write Vals for the extended set of syntactic values, i.e. FNames C Vals. 
Let ECtxs stand for the set of extended evaluation contexts, defined as K in 
Figure 1 taking the extended definition of values into account. Before defining the 
transition relation of our LTS, we discuss the shape of configurations, providing 
intuitions behind each component. 

Passive configurations take the form (y,&,¢,h) and are meant to repre- 
sent stages at which the environment is to make a move. 


— 7: (FNames — Vals) w (CNames — ECtxs) is a finite map. It will play the 
role of an environment that relates function names communicated to the en- 
vironment (i.e. those introduced by P) to syntactic values, and continuation 
names introduced by P to evaluation contexts. 

— €: (CNames — CNames) is a finite map. It complements the role of y for 
continuation names and indicates the continuation to which the outcome of 
applying y(c) should be passed. 

— @ C Names. The set @ will be used to collect all the names used in the 
interaction, regardless of which participant introduced them. Following our 
description above, those introduced by O will correspond to ¢ \ dom(y). 


The components satisfy healthiness conditions, implied by their role in the sys- 
tem. Let X = dom(h). 


— If f : dom(y)NFNames,_,, then y(f) is a value such that X F y(f): o > o. 
dom(£) = dom(y) N CNames. 

— If c: dom(y) N CNames, and X F y(c): o > o’ then €(c) € CNames,:. 

— Finally, names introduced by the environment and communicated to the pro- 
gram may end up in the environments and the heap: v(img(7)), v(img(§)), 
v(img(h)) C ¢ \ dom(7). 

Active configurations take the form (M, c, Y, £, ġ, h) and represent interaction 
steps of the term. The y, €, ġ, h components have already been described above. 
For M and c, given X = dom(h), we will have X;Ø H- M : o, c € CNames, and 
v(M) U {c} € $\ dom(7). 
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3.5 Transitions 


Observe that any closed value V of a cont- and ref-free type ø can be decom- 
posed into an abstract value A (pattern) and the corresponding substitution + 
(matching). The set of all such decompositions, written AVal,(V), is defined 
below. Given a value V of a (cr-free) type o, AVal,(V) contains all pairs (A, y) 
such that A is an abstract value and y : v(A) > Vals is a substitution such that 
A{y} = V. More concretely, 


AVal,(V) 44(V,0)} for o € {Unit, Bool, Int} 
AValeso(V) *{(f,[f = VI) | f € FNames,_,,/} 
={ 


AVals xo ((U, VY) £ {((A1, A2), 11 * 72) | 
Aa e AVal,(U), (A2, 72) € AVal,/(V)} 


Note that, by writing -, we mean to implicitly require that the function domains 
be disjoint. Similarly, when writing W, we stipulate that the argument sets be 
disjoint. 

Example 2. Let o = (Int — Bool) x (Int x (Unit > Int)) and V = (Az. Æ 
1, (2, AxU™*.3)). Then AVal, (V) equals 


{((f, (2,9)), [f = (Aai*.a Æ 1] - [g = (Av?™.3))) | 


f € FNamesintunit, g € FNamesunit—int}- 


Finally, we present the transitions of what we call the HOSC[HOSC] LTS in 
Figure 3. 


Example 3. Below we analyse the (PQ) rule in more detail. 
(K[fV], c, y, £, Q, h) Elenan (y ` a ` [e i K], £ g le zd c], ow v(A) W {c}, h) 
when f : o > 0’, (A, y’) € AVal,(V) and œ : 0’ 


The use of W in dW v(A) w {c'} is meant to highlight the requirement that the 
names introduced in f(A, c’), i.e. v(A)U{c'}, should be fresh and disjoint from ¢. 
Moreover, note how y and € are updated. In general, y, €, h are updated during 
P-actions. 


Definition 9. Given two configurations C, C', we write C 2 Œ if C50" 3s 


C’, with SS representing multiple (possibly none) t-actions. This notation is 
extended to sequences of actions: given t = a ,...a,, we write C = C, if 


there exist C1,...,Cn-1 such that C > C,---Cy_1 = C. We define 
Tryosc(C) = {t | there exists C’ such that C = C’}. 


Lemma 3. Suppose C = (7,£,¢,h) or C = (M,c,7,&,¢,h) are configurations. 
Then elements of Truosc(C) are (¢ \ dom(7), dom(y))-traces. 


358 G. Jaber and A. S. Murawski 


(Pr) |(M,c,7,6¢,h) +  (N,c',7,€,¢,h’) 
when (M,c,h) > (N,c’,h’) 

(PAV, cap h) ZE (y-9',€,6 H(A), hy 
when c: g, (A, y’) € AVal,(V) 


f(A,e’) 


(PQ)|(KLAV], 6.7.6 ph) “OSs (7-9: [el KLE [e+ do (A) Y {e}, h) 
when f: o —> 0", (A, y) € AVal, (V), e:o 
(OA)| (7, £, Q, h) w, (lake „7: £, 6 (A), h) 


when c: 0, A: 0, y(c) = K, E(c) = d 
f(A 


(OQ), E, Q, h) a a 
when f:o >o’, A:0,c:0o', (f) =V 


NB c: ø stands for c € CNames,. 


Fig. 3. HOSC[HOSC] LTS 


MẸ"! : let x = ref 0 in M”! : let x = ref 0 in 
let b = ref ff in let b = ref ff in 
(Af. if a(!b) then (Af. if a(!b) then 
b := tt; fQ);x :=!x+1; b := tt; let n =!x in f();x := n + 1; 
b := fF; b := ff; 
else (), A- : Unit.!x) else () , à- : Unit.!x) 


Fig. 4. Callback-with-lock Example [4] 


Example 4. In Figure 5, we show that the trace from Example 1 is generated 
by the configuration C = (M¢"",c, 0,0, {c},0), where Mf“ is given in Figure 4. 
We write inc © Af.if A(!0,) (% := tt; f(); le =M; + 1; & := fF) (), get & A-Ma 
and c : ((Unit — Unit) > Unit) x (Unit — Int). It is interesting to notice that 
in this interaction, Opponent uses the continuation N twice, incrementing the 
counter x by two. The second time, it does it without having to call inc again, 
but rather by using the continuation name c2. 


Remark 2. Due to the freedom of name choice, note that Tryogc(C) is closed 
under type-preserving renamings that preserve names from C. 


3.6 Correctness and full abstraction 


We define two kinds of special configurations that will play an important role 
in spelling out correctness results for the HOSC[HOSC] LTS. Let I = {a : 
O1,°°: Zk : Ok}. A map p from {a1,--- £p} to the set of abstract values will 
be called a l-assignment provided, for all 1 < i Æ j < k, we have p(2;) : ci 


and v(p(z:)) A v(p(x;)) = 0. 
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(Mi, c, 0,0, {c}, 0) 

(linc, get), c, 0,0, {c}, [ba > fF, ls + 0]) 

(71,9, {c, 91, 92}, [lo > f, le + 0) with yı = [g1 > inc, g2 + get], 
(inc fi,c1, 71, 0, Q2, [lo > Æ, ls > 0) with de = {c, 91, 92, fi, c1} 
(fO; N, c1, 71,9, 2, [o tt, £2 > 0])) with N = é, :=l, + 1; b := ff 
(Y2, E, Q3, [lo > tt, ls > 0)) with y2 = q1 - [c2 > ©; N], 

(0; N, c1,72, E, 93, [Eo > tt, Ce + 0)) € = [c2 > cı] and $3 = ġ2 W {co} 
(0, c1, 92, E, $3, lo > fÈ, l = 1)) 
( 
( 
( 
( 
( 
( 
( 


y2, E, 63, [> +> f, £2 > 1)) 
); N, c1, 72, E, 3, bo > f, lz + 1)) 
),¢1, V2, E, Q3, la > fF, ls > 2]) 
y2, E, $3, [lo > fF, lz + 2) 
get(), c3, V2, E, ba, bo > fF, Le > 2)) with ¢4 = ¢3 W {c3} 
2, c3, Y2, E, Q4, [lo +> fF, lz > 2)) 

y2, E, b4, [bo +> f, ls > 2)) 


ol 
A 

N 
pm 


Fig. 5. Trace derivation in the HOSC[HOSC] LTS 


Definition 10 (Program configuration). Given a l-assignment p, a cr-free 
HOSC term + M : T andc: 7, we define the active configuration Ch; by 
Cir = (M{p},¢,0,0, v(o) U {c}, 0). 


Note that traces from Tryosc(Chy ) will be (v(p) U {c}, Ø)-traces. 


Definition 11. The HOSC[HOSC] trace semantics of a cr-free HOSC term 
I'M :7 is defined to be 


Tryosc(l + M : 7) = {((p,¢),t)|p is a l-assignment, c : 7, t € Tryosc (Chr) }- 


Example 5. Recall the term + M£”! : r from Example 4, the trace tı and the 

configuration C such that tı € Tryosc(C). Because Me™! is closed (I = Ø), 

the only I’-assignment is the empty map 0. Thus, C = cae so ((@,c),t1) € 
1 

Tryosc(h Mf"! : 7). 


Having defined active configurations associated with terms, we now define 
passive configurations associated with contexts. Let us fix o € FNamesunit— unit 
and, for each ø, a continuation name os E€ CNames,. Let o = (J, {00}. Intu- 
itively, the names © will correspond to Jerr and os to Iter- 

Recall that efr stands for err : Unit — Unit. Given a heap h : X; efr, an 
evaluation context X; efr + K :7— 7’ and a substitution X; e?r H y: I (as in 
the definition of ese ie? let us replace every occurrence of cont, K’ inside 
h, K,y with cont, (K’,o,/), if K’ has type o — o’. Moreover, let us replace 
every occurrence of the variable err with the function name o. This is done to 
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adjust h, K,y to the extended syntax of the LTS: the upgraded versions are 
called ho, Yo, Ko. 

Next we define the set AValr(7) of all disjoint decompositions of values from 
Yo into abstract values and the corresponding matchings. Recall that I = {21 : 
01,°': Zk : Ok}. Below A; stands for (Aj,--- , Ax), and ¥; for (y1, Yk) 


AValr(y) ={ (Ai, F) | (Ain i) € AVale, (Yo(as)), $= 1,--- ,k; 
v(A,),--: ,v(Ag) mutually disjoint and without © } 


Definition 12 (Context configuration). Given X, h: X; e?r, Ljeirt K: 


ToT’, Lierrby: I, (Ai, 7%) € AValr(y) andc: 7 (c € o), the corresponding 
configuration Chie y is defined by 


k k 
Cka = (Wu {er Ko}, {e on}, l) v(i) 8 {c} WoW {o}, ho). 
i=1 i=1 
Intuitively, the names v(A;) correspond to calling function values extracted from 
y, whereas c corresponds to K. Note that traces in Tryosc (Chy) will be 
(0w {0}, E; v(A;) © {c})-traces. 
In preparation for the next result, we introduce the following shorthands. 

— Given a (No, Np)-trace t, we write t+ for the (Np, No)-trace obtained by 
changing the polarity of each name: f(A, c’) becomes f(A, c’) (and vice versa) 
and c(A) becomes ¢(A) (and vice versa). 

— Given (Aj, 7) € AValr (y), we define a I’-assignment pz by pz (xi) = Ai. 
Note that v(p z ) = Wh, dom(%). 


Lemma 4 (Correctness). Let + M :7 be acr-free HOSC term, let X, h, K,7 


> 


be as above, (A;, ¥;) € AValr (y), and c:T (cgo). Then 

— (K[M{y}],h) Perr iff there exist t,d such that t € Tryoso(ChE S) and 
t+ 9((),¢) € Tryosc(C7 g). 
— (K[M{7}, h) Yrer iff there exist t, A,o such that t € Tryosc (ChE ) and 
t+ 6,(A) € Tryosco(C7 g). 
Moreover, t satisfies v(t) N (° U {o}) = 4. 


Intuitively, the lemma above confirms that the potential of a term to converge 
is determined by its traces. Accordingly, we have: 


Theorem 1 (Soundness). For any cr-free HOSC terms + My, Mo, if 
Tryosc(l H M1) C Trnosc(I H M2) then PE My SESSO) Mp. 

To prove the converse, we need to know that every odd-length trace generated 
by a term actually participates in a contextual interaction. This will follow from 


the lemma below. Note that 1).,; relies on even-length traces from the context 
(Lemma 4). 
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Lemma 5 (Definability). Suppose dW {o} C FNames and t is an even-length 
(oW {o}, dW {c})-trace starting with an O-action. There exists a passive configu- 
ration C such that the even-length traces Tryosc(C) are exactly the even-length 
prefixes of t (along with all renamings that preserve types and dW {c} WoW {o}, 
cf. Remark 2). Moreover, C = (yo: [c > Ko], {cH or}, o W {c} WoW {o}, ho), 
where h, K,y are built from HOSC syntaz. 


Proof (Sketch). The basic idea is to use references in order to record all continu- 
ation and function names introduced by the environment. For continuations, the 
use of call/cc, is essential. Once stored in the heap, the names can be accessed 
by terms when needed in P-actions. The availability of throw and references to 
all O-continuations means that arbitrary answer actions can be scheduled when 
needed. 


Theorem 2 (Completeness). For any cr-free HOSC terms [+ Mi, M2, I H 
M, Sask) M, implies Tryosco(P + M1) C Tryosc (I F Mg). 


Theorems 1, 2 (along with Lemmas 1, 2) imply the following full abstraction 
results. 


Corollary 1 (HOSC Full Abstraction). Suppose I + Mı, Mə are cr-free 
HOSC terms. Then Tryosc(l H Mı) C Tryosc(l H Mə) iff T E Mı xX HOS 
Mə iff T} M, <HOSC My. 

Example 6 (Callback with lock [4]). Recall the term F Me™! : ((Unit > Unit) > 
Unit) x (Unit > Int) from Example 4, given in Figure 4. We had tı = G((g1, g92)) 


Alfa c) FCO, c2) (0) &(() (0) &0) 920,63) &2) € Troso (Chreu): 


Define tz to be tı except that its last action ĉ3(2) is replaced with ¢3(1). 
Observe that tı € Troso (Chren) \ Troso (Chreu) and to € Truoso (Chren) \ 


Tryosc(Che ur), i.e. by the Corollary above the terms are incomparable wrt 
1 
<HOSC | However, they are equivalent wrt <%,,. for x € {GOSC, HOS, GOS} [8]. 


mer NETT 

The above Corollary also provides a handle to reason about equivalence via trace 
equivalence. Sometimes this can be done directly on the LTS, especially when y 
can be kept bounded. 


Example 7 (Counter [28]). For i € {1,2}, consider the terms F M; : (Unit > 
Unit) x (Unit > Int) given by M; = letx = ref Oin (inc;, get;), where inc; = 
(Ay. :=!x +1), inc = (Ay.x :=!x — 1), get, = Az.!x, get. = Az.—!zx. In this case, 
Truosc (Chr) contains (prefixes of) traces of the form ¢((g, h)) t, where t is built 
from segments of two kinds: either g((), ci) G(()) or h((),¢,) ¢(n), where the 
cis and cis are pairwise different. Moreover, in the latter case, n must be equal 
to the number of preceding actions of the form g((), ci). For this example, trace 
equality could be established by induction on the length of trace. Consequently, 
M, SHOSC yyy. 


“err 


362 G. Jaber and A. S. Murawski 
4 GOSC[HOSC] 


Recall that GOSC is the fragment of HOSC in which general storage is restricted 
to values of ground type, i.e. arithmetic/boolean constants, the associated ref- 
erence names, references to those names and so on. In what follows, we are 
going to provide characterizations of <GOS° via trace inclusion. Recall that, by 
Lemma 2, <GO8C=<G°SC_ Note that we work in an asymmetric setting with 
terms belonging to HOSC being more powerful than contexts. 

We start off by identifying several technical consequences of the restriction to 
GOSC syntax. First we observe that GOSC internal reductions never contribute 
extra names. 


Lemma 6. Suppose (M,c,h) > (M’',c’,h’), where M is a GOSC term and h 
is a GOSC heap. Then v(M) U {c} D v(M") U {e}. 


Proof. By case analysis. All defining rules for —>, with the exception of the 
(K[!¢],h) > (K[h(2)], h) rule, are easily seen to satisfy the Lemma (no function 
or continuation names are added). However, if the heap is restricted to storing 
elements of type (as in GOSC) then A(¢) will never contain a name, so the 
Lemma follows. 


The lemma has interesting consequences for the shape of traces generated by 
the context configurations ein if they are built from GOSC syntax. Recall 
that P-actions have the form f(A,c’) or (A), where f,c are names introduced 
by O. It turns out that when h, K,7 are restricted to GOSC, more can be said 
about the origin of the names in traces generated by Ce x,y: they will turn out to 
come from a restricted set of names introduced by O, which we identify below. 
The definition below is based on following the justification structure of a trace — 
recall that one action is said to justify another if the former introduces a name 
that is used for communication in the latter. 


Definition 13. Suppose p {o} C FNames and c € CNames. Let t be an odd- 
length (o © {0}, bw {c})-trace starting with an O-action. The set Visp(t) of P- 
visible names of t is defined as follows. 


7 Visp(t d (A')) = {o} VoUW(A’) c=c 
Visp(t f(A", c) t c'(A’)) = Visp(t) Uv(A’) 

7 Visp(t f' (4’, c')) = fo} UoUV(A’) U {ec} fied 

Visp(t f’(A",c") t f(A’, c)) = Visp(t) Uv (A‘) U {cc} fi ev(A") 

Visp(t P(A") t f'(A’,¢)) = Visp(t) Uv (A) U {e} fiEev(A 


Note that, in the inductive cases, the definition follows links between names 
introduced by P and the point of their introduction, names introduced in- 
between are ignored. Here readers familiar with game semantics will notice sim- 
ilarity to the notion of P-view [12]. 

Next we specify a property of traces that will turn out to be satisfied by 
configurations corresponding to GOSC contexts. 
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Definition 14. Suppose dW {o} C FNames and c E€ CNames. Let t be a (o W 
{o}, dw {c})-trace starting with an O-action. t is called P-visible if 


— for any even-length prefix t f(A,c) of t, we have f € Visp(t'), 
— for any even-length prefix t (A) oft, we have c € Visp(t’). 


Lemma 7. Consider C = Cr where h, K,y are from GOSC and ( A) E 
AValr (y). Then all traces in Tryosc(C) are P-visible. 


The Lemma above shows that contextual interactions with GOSC contexts rely 
on restricted traces. We shall now modify the HOSC[HOSC] LTS to capture the 
restriction. Note that, from the perspective of the term, the above constraint 
is a constraint on the use of names by O (context), so we need to talk about 
O-available names instead. This dual notion is defined below. 


Definition 15. Suppose p C FNames and c € CNames. Let t be a ($ w {c}, 0)- 
trace of odd length. The set Viso(t) of O-visible names of t is defined as 
follows. 


Viso(t 3 (A) = v(A') d=c 
Viso(t f’(A",c) t €(A')) = Viso(t) U v(A') c#c 
Viso(t P(A, c)) = v(A) U {c} feo 
Viso(t f"(A",c") Y F'(A, e)) = Viso(t) Uv(A’) U {c} fl EvA") 
Viso(t (A) t F(A, cd)) = Viso(t) Uv(A’) U {e} fi E€ v(A") 


Analogously, a (@ ®© {c},0)-trace t is O-visible if, for any even-length prefix 
U f(A,c) oft, we have f € Viso(t') and, for any even-length prefix t'c(A) oft, 
we have c € Viso(t'). 


Example 8. Recall the trace 


ti =@((g1,92)) gi(fiser) f(O, c2) €2(()) a0) (0) a0) 920,63) €3(2) 


from previous examples. Observe that 


Viso (E((g1, 92)) g1 (f1, er) fi((), €2)) = {91, 92, ca} 


Viso (El{g1, 92)) (f1; c1) f(Q, c2) (0) &(0)) = {91:92} 


Consequently, the first use of c2(()) in tı does not violate O-visibility, but the 
second one does. 


In Figure 6, we present a new LTS, called the GOSC[HOSC] LTS, which will 
turn out to capture <G°°° through trace inclusion. It is obtained from the 
HOSC[HOSC] LTS by restricting O-actions to those that rely on O-visible names. 
Technically, this is done by enriching configurations with an additional compo- 
nent F, which maintains historical information about O-available names imme- 
diately before each O-action. After each P-action, F is accessed to calculate the 
current set V of O-available names according to the definition of O-availability 
and only O-actions compatible with O-availability are allowed to proceed (due 
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T 


Pr) (M,c,7,€, Q, h, F) = (N, e, q, E, Q, h, F) 
when (M,c,h) > (N, c,h’) 


PA) (V, q, £ ph, F) ZE (qY, pYA), h, F, Fc) W (A) 


when c: ø and (A, 7’) € AValo (V) 
PQ)(KIFV],c,7, £, b, h, P ŽES 

yeye [e +> K] £- [e = e], H o, h, F, F(F) 8 9) 
when f : o > 0’, (A, y) € AValo (V), c : 0’ and Q = v(A) w {c'} 


OA), £ p, h, F, V) £4, (KIA), cd, y E pY (A), h, F- [v(A) = VI) 
when cE V, c: 0, A: 0, (c) = K, E(c) = d 
OQ) (7, £, b, h, F, V) LAD, IVA, cy, pY 4h, F [p = V) 


when fE V, f:a30',A:o,c:0',7(f) =V and ¢ =v(A) w {c} 


Given N C Names, [N > V] stands for the map [nH V|n € N]. 


Fig. 6. GOSC[HOSC] LTS 


to the f € V, c € V side conditions). We write Trcosc(C) for the set of traces 
generated from C in the GOSC[HOSC] LTS. 

Recall that, given a I -assignment p, term I' H M : T and c € CNames,, the 
active configuration Chy was defined by Chy = (M{p},c,0,0,v(p) U {c}, 0). We 
need to upgrade it to the LTS by initializing the new component to the empty 
map: Chr vis J (M{p}, C, 0, 0, v(p) U {c}, 0, 0). 

Definition 16. The GOSC[HOSC] trace semantics of a cr-free HOSC term 
[EM :7 is defined by Trcosc(. + M: 7) = {((p,c), t) |p is a l-assignment, 
Cx; te Treoso(Chy vis) }- 

By construction, it follows that 

Lemma 8. t € Treosc(Chy vis) ifft E€ Truosc(Chy) and t is O-visible. 


Noting that the witness trace t from Lemma 4 is O-visible iff t+ 5((),¢’) is P- 
visible, we can conclude that, for GOSC, the traces relevant to JĮJerr are O-visible, 
which yields: 


Theorem 3 (Soundness). For any cr-free HOSC terms [+ Mi, Mə, if 
Trcosc(I H M1) C Traose(I + Mo) then PF Mi Sew Mp. 


To prove the converse, we need a new definability result. This time we are 
only allowed to use GOSC syntax, but the target is also more modest: we are 
only aiming to capture P-visible traces. 


Lemma 9 (Definability). Suppose dW {o} C FNames and t is an even-length 
P-visible (oW {0}, GW {c})-trace starting with an O-action. There exists a passive 
configuration C such that the even-length traces in Tryosc(C) are exactly the 
even-length prefixes of t (along with all renamings that preserve types and dW 
{c} WoW {o}). Moreover, C = (yo + [eH Ko], {c or}, dH {c} WoW {o}, ho), 
where h, K,y are built from GOSC syntax. 
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Proof (Sketch). This time we cannot rely on references to recall on demand all 
continuation and function names introduced by the environment. However, be- 
cause t is P-visible, it turns the uses of the names can be captured through vari- 
able bindings (Az.--- for function and call/cc_(a....) for continuation names). 
Using throw, we can then force an arbitrary answer action, as long as it uses a 
P-available name. To select the right action at each step, we branch on the value 
of a single global reference of type ref Int that keeps track of the number of steps 
simulated so far. 


Completeness now follows because, for a potential O-visible witness t from 
Lemma 4, one can create a corresponding context by invoking the Definabil- 
ity result for t+ 5((),c’). It is crucial that the addition of 3((),c’) does not break 
P-visibility (o is P-visible). 

Theorem 4 (Completeness). For any cr-free HOSC terms [+ Mı, Mo, if 
DE M, Say" M, then Traoso(l E M1) © Traosc(I H Ma). 


Altogether, Theorems 3, 4 (along with Lemma 1) imply the following result. 


Corollary 2 (GOSC Full Abstraction). Suppose I į Mı, Mə are cr-free 
HOSC terms. Trcosc(L F Mı) (= Treosc (l H Mə) off T A Mi Z GOSEE) Mə 
iff r} Mı <GO8° Mp. 


NETT 


Example 9. In the Callback with lock example (Example 6), we exhibited traces 
t1,t2 that separated M£”! and MS”! with respect to <#OS°. Example 8 shows 
that neither trace is O-visible, i.e. they do not belong to Treosc(I F Mı) or 
Treosc(I’ + M2). Thus, the two traces cannot be used to separate Mp’, Mg”! 
with respect to <GO8°. As already mentioned, this is in fact impossible: we have 


a Mow ~GOSC Mew; 

Example 10 (Well-bracketed state change [4]). Consider the following two terms 
M?Y**¢ £ let x = ref OinAf.(x := 0; f); x := 1; f(); !z) 
My" = dF. f(s FOD. 

of type T = (Unit > Unit) — Int, let 


ta =e(g) g(fi,er) fi(Q,e2) (0) Oes) 9(f2,¢4) f(c) e(O) 4(0) 


and let t4 be obtained from t3 by changing 0 in the last action to 1. One can 
check that both traces are O-visible: in particular, the action c3(()) is not a 
violation because 


Viso(@(g) g(fi,e1) A(O, c2) €2(()) fi(Q), c3) g(fe,e4) fo((), ¢5)) = {9, €3, 65}. 


0, 0, 0, 
Moreover, t3 € Traosc (Cyres) \ Traosc (Cyres) and t4 € Traosc (Cyrene) \ 
Treoso(Ch isc): By the Corollary above, we can conclude that M{¥s°, pèse 
ab 


t <GOSC, However, they turn out to be SHOS- and x995- 


NETT “err 


are incomparable wr 
equivalent. 


366 G. Jaber and A. S. Murawski 


5 HOS[HOSC] 


Recall that HOS is the fragment of HOSC that does not feature continuation 
types and the associated syntax. In what follows we are going to provide al- 
ternative characterisations of <#O8 and SHOS in terms of trace inclusion and 
complete trace inclusion respectively. 

We start off by identifying several technical consequences of the restriction 
to HOS syntax. First we observe that HOS internal reductions never change the 


associated continuation name. 


Lemma 10. /f (M,c,h) > (M',¢,h’), M is a HOS term and h is a HOS heap 
thenc=Cc'. 


Proof. The only rule that could change c is the rule for throw, but it is not part 
of HOS. 


The lemma has a bearing on the shape of traces generated by the (passive) 
configurations Cre, corresponding to HOS contexts. In the presence of throw 
and storage for continuations, it was possible for P to play answers involving 
arbitrary continuation names introduced by O. By Lemma 10, in HOS this will 
be restricted to the continuation name of the current configuration, which will 
restrict the shape of possible traces. Below we identify the continuation name 
top p(t) that becomes the relevant name after trace t. If the last move was an 
O-question then the continuation name introduced by that move will become 
that name. Otherwise, we track a chain of answers and questions, similarly to 
the definition of P-visibility. 3 

Observe that, because h, K, y are from HOS, Chien will generate ({o,/,o}, ow 
{c})-traces, where 7’ is the result type of K, because ho = h, Ko = K, yo = 7. 


Definition 17. Suppose ¢ J {o} C FNames and c € CNames. Let t be a 
({0,7,o}, dW {c})-trace of odd length starting with an O-action. The continu- 
ation name top p(t) is defined as follows. 


o TiC See 
top p(t, f(A", c) te c (A')) = top p(t1) 
topp(t f(A’,c)) =¢ 
We say that a ({o, U {o}, 6 © {c})-trace t starting with an O-action is P- 


bracketed if, for any prefix t! € (A) oft (i.e. any prefix ending with a P-answer), 
we have d = toppi(t'). 


Lemma 11. Consider C = (alae where h, K,y are from HOS and ( A, J;) € 
AValr (y). Then all traces in Tryosc(C) are P-bracketed. 


The Lemma above characterizes the restrictive nature of contextual inter- 
actions with HOS contexts. Next we shall constrain the HOSC[HOSC] LTS ac- 
cordingly to capture the restriction. Note that, from the point of view of the 
term, the above-mentioned constraint concerns the use of continuation names 
by O (the context), so we need to talk about O-bracketing instead. This dual 
notion of “a top name for O” is specified below. 
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(Pr) (M, C5785 9; h) aes (N, d, y, E, Q, h’) 
when (M,c,h) > (N, c,h’) 
(PAV, 678 b)h) => y 

when c: o, (A, y’) € AVal,(V), (ec) =e 
(PQ)|(KIFV], 6, 7€, h) ZE ( 
when f :0 >’, ( 
(OA) (E, Q, h, c") —_— (K[A], c, 7, E, p v(A), h) 
when c = c", c: 0, A: 0, yle) = K, &(c) = d 
OQ Eph) LER VAa E: [e 0], 68 v(a) Y {ch h) 
when f:0 40’, A:0,c:0',y(f) =V 


pad 
> 
d 

S 


D 
Rr 
Mm 
> 
S 
ax 
S 
an 
a 


Fig. 7. HOS[HOSC] LTS 


Definition 18. Suppose p C FNames and c € CNames. Let t be a ($ w {c}, 0)- 
trace of odd length. The continuation name topo(t) is defined as follows. In the 
first case, the value is L (representing “none” ), because c is the top continuation 
passed by the environment to the term (if it gets answered there is nothing left 
to answer). 


topo(t_e(A)) = 
topo(ta f(A", c) te C(A") 
topo(t f(A’,<’)) 


II 


ale 
topo(tı) 
d 


We say that a ($ {c},@)-trace t is O-bracketed if, for any prefix t c'(A) of 
t (i.e. any prefix ending with an O-answer), we have c = topo (t). 


In Figure 7, we present a new LTS, called the HOS[HOSC] LTS, which will 
turn out to capture SHS. It is obtained from the HOSC[HOSC] LTS by re- 
stricting O-actions to those that satisfy O-bracketing. Technically, this is done 
by enriching passive configurations with a component for storing the current 
value of topo(t). In order to maintain this information, we need to know which 
continuation will become the top one if P plays an answer. This can be done with 
a map that maps continuations introduced by O to other continuations. Because 
its flavour is similar to € (which is a map from continuations introduced by P) 
we integrate this information into €. The c = c” side condition then enforces 
O-bracketing. We shall write Tryos(C) for the set of traces generated from C 
in the HOS[HOSC] LTS. 


Recall that, given a l-assignment p, term l F M : rT and c: 7, the active 
configuration C4; was defined by C47" = (M{p},c, 0,0, v(p)U{c}, 0). We upgrade 
it to the new LTS by setting Coy pra = (M{p},¢,0, [cH L],v(p)U{c}, 0,0). This 
initializes € in such a way that, after G(A) is played, the extra component will 
be set to L, where L is a special element not in CNames. 
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Definition 19. The HOS[HOSC] trace semantics of a cr-free HOSC term 
Tr M:7 is defined to be Tryos(I. + M : 7) = {((p,c),t)|p is a D'-assignment, 
CiT, tE Truos(Chy bra) }- 


By construction, it follows that 
Lemma 12. t € Tryos(Chy pra) iH t € Trnosc(Chy) and t is O-bracketed. 


Noting that the witness trace t from Lemma 4 is O-bracketed iff t+ 3((),¢’) is 
P-bracketed, we can conclude that, for HOS, the traces relevant to |e, are 
O-bracketed, which yields: 


Theorem 5 (Soundness). For any cr-free HOSC terms [+ Mı, Ma, if 
Truos(l H My) C Tryos(P M2) then Pb My SESS Mp. 


For the converse, we establish another definability result, this time for a P- 
bracketed trace. 


Lemma 13 (Definability). Suppose ¢W{o} C FNames and t is an even-length 
P-bracketed ({0,/,o},@W {c})-trace starting with an O-action. There exists a 
passive configuration C such that the even-length traces Tryosc(C) are exactly 
the even-length prefixes of t (along with all renamings that preserve types and 
@W {c,0o7,0}). Moreover, C = (y - [cH K], {cH or}, ®© {c,0,7, 0}, h), where 
h, K, y are built from HOS syntaz. 


Proof (Sketch). Our argument for HOSC is structured in such a way that, for a 
P-bracketed trace, there is no need for continuations (throwing and continuation 
capture are not necessary). 


Completeness now follows because, for a potential witness trace t from Lemma 4, 
one can create a corresponding context by invoking the Definability result for 
t+ 3((),c’). It is crucial that the addition of 3((),c’) does not break P-bracketing 
(it does not, because the action is a question). 


Theorem 6 (Completeness). For any cr-free HOSC terms [+ Mı, Mo, if 
PEM, Seg" Mp then Truos(I H M1) C Trwos(P F Mp). 


Altogether, Theorems 5, 6 (along with Lemma 1) imply the following result. 


Corollary 3 (HOS Full Abstraction). Suppose + Mı, M2 are cr-free HOSC 
terms. Then Tryos(I H Mi) © Tryos(l F Mə) iff T H- Mı <HOS(ciu) Mp iff 
r- Mı <HOS Mp. 


m~verr 


Example 11 (Assignment/callback commutation [27]). For i € {1,2}, let f : 
Unit —> Unit F M; : Unit > Unit be defined by: 


M; = letn = ref (0) in AyU™* if (In > 0) () (n := 1; f( 
Mp £ letn = ref (0) in AyU™* if (In > 0) () (fO;n := 
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Operationally, one can see that f + Mı ZE9S M, due to the following HOS con- 
text: letr = ref (Ay-y) in (let f = Ay.(!r)(Q in (r := e; (!r)())); err. In our frame- 
work, this is confirmed by the trace 


ts = 9) Oa) Fe) ge) &((), 
which is in Tryos(Chy, ) \ Trpos(Chy,)- On the other hand, 


te = eg) g(a) F(O,¢2) g(0,c2) F(Q,¢s) 


is in Tryos(Chy,) \ Truos(C4y, ), so the terms are incomparable. Note, however, 
that both traces break O-visibility: specifically, we have 


Viso (Elg) 9((),e1) F(,¢2)) = {02}, 


so the g((),c2) action violates the condition. Consequently, the traces do not 
preclude f + Mı SZ Mə for x € {GOSC, GOS}. 


“err 


For x € {HOSC, GOSC}, <%,.. and <%,. coincide. Intuitively, this is because the 
presence of continuations in the context makes it possible to make an escape at 
any point. In contrast, for HOS, the context must run to completion in order to 
terminate. 

At the technical level, one can appreciate the difference when trying to trans- 
fer our results for eee! to eae Recall that, according to Lemma 4, 
Uter relies on a witness trace t such that the context configuration generates 
t+ o7 (). In HOS, the latter must satisfy P-bracketing, so we need top p(t+) = ov. 
Note that this is equivalent to topg(t) = L. Consequently, only such traces are 
relevant to observing J) ter. 

We shall call an odd-length O-bracketed (¢ W {c}, Ø)-trace t complete if 
topo(t) = L. Let us write Tryos(l F Mı) Ce Tryos(I + M2) if we have 
((p,c),t) € Truos(I F M2) whenever ((p,c),t) € Troos(L + Mı) and t is com- 
plete. Following our methodology, one can then show: 


Theorem 7 (HOS Full Abstraction for <H#9%). Suppose I į M1, Mo are cr- 


~nter 


free HOSC terms. Tryos(l F Mi) Ce Tryos( l F Me) if r H Mi HOEK Ht) Mə 
iff TrA Mi <HOS M3. 


~nter 
Example 12. Let Mı = Af Unit, f(); Runi and Mo = AfUMtCURIE Quit. 
We will see that H Mı ZH95 M> but + Mı SHOS Mp. To see this, note that 
Truos(Chr, ) contains prefixes of ¢(g) g(f,c1) f((), c2) c2(()), while Truos(Chy, ) 
only those of G(g) g(f,c1). Observe that the only complete trace among them 
is c(g). The trace t = e(g) g(f,c1) F((), c2) is not complete, because topo(t) = 
c2. Consequently, Tryos(I F Mı) yA Troos(l" k Mə) but Truos(l" E Mı) Ge 


Tryos(L F Mo). 


The theorem above generalizes the characterisation of contextual equivalence 
between HOS terms with respect to HOS contexts [23], where trace completeness 
means both O- and P-bracketing and “all questions must be answered”. Our 
definition of completeness is weaker (O-bracketing + “the top question must 
be answered” ), because it also covers HOSC terms. However, in the presence of 
both O- and P-bracketing, i.e. for HOS terms, they will coincide. 
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6 GOS[HOSC] 


Recall that GOS features ground state only and, technically, is the intersection 
of GOSC and HOS. Consequently, it follows from the previous sections that GOS 
contexts yield configurations that satisfy both P-visibility and P-bracketing. For 
such traces, the definability result for GOSC yields a GOS context. Thus, in 
a similar fashion to the previous sections, we can conclude that O-visible and 
O-bracketed traces underpin <G°°. To define the GOS LTS we simply combine 
the restrictions imposed in the previous sections, and define Trgeos( l F M) 
analogously. The results on [O03 from the previous section also carry over to 
GOS. 


Theorem 8 (GOS Full Abstraction). Suppose [+ Mı, Mə are cr-free HOSC 
terms. Then: 


— Troos(T H Mı) © Troos(l F M2) if TH Mi SSES Ma if T 
M: S898 Mp. 

— Trceos(l H Mi) Ce Troos H Mo) if T H Mi SEOS) Ma iff r H 
M, S85 Mo. 


T 


7 Concluding remarks 


Asymmetry Our framework is able to deal with asymmetric scenarios, where 
programs are taken from HOSC, but are tested with contexts from weaker frag- 
ments. For example, we can compare the following two HOSC programs, where 
f : ((Unit > Unit) > Unit) —> Unit is a free identifier. 


let b = ref ff in callcc(y. callec(y. 
f(\g.b := tt; g(); throw() to y); f(Ag.g(); throw() to y); 
if !b then () else div) div) 


~HOS 


“err 


with div representing divergence. The terms happen to be 


not >HOSC_ equivalent. 


To see this at the intuitive level, we make the following observations. 


-equivalent, but 


— Firstly, we observe that, to distinguish the terms, f should use its argument. 
Otherwise, the value of b will remain equal to ff, and the only subterm that 
distinguishes the terms (‘if !b then () else div’) will play the same role as div 
in the second term. 

— Secondly, if f does use its argument, then b will be set to tt in the first pro- 
gram, raising the possibility of distinguishing the terms. However, if we allow 
HOS contexts only then, since the argument to f was used, it will have to 
run to completion, before ‘if !b then () else div’ is reached. Consequently, we 
will encounter ‘throw () to y’ earlier and never reach ‘if !b then () else div’. 
This is represented by the trace 


fhea)  h(g,c2) (ses) e(O) (O) 
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This trace is O-bracketed, but not P-bracketed since Player uses throw to 
answer directly to the initial continuation c rather than c2. 

— Finally, if HOSC contexts are allowed, it is possible to reach the subterm 
‘if !b then () else div’ with b set to tt. This is represented by the trace 

f(h,er)  h(g,c2)  a();c3) a) = (0) 

This trace is not O-bracketed, because cı is answered rather than c3, like 
above. Consequently, the trace witnesses termination of the first term, but 
the second term would diverge during interaction with the same context. 


We plan to explore the opportunities presented by this setting in the future, 
especially with respect to fully abstract translations, for example, from HOSC 
to GOS. 


Richer Types Recall that our full abstraction results are stated for cr-free terms, 
terms with cont- and ref-free types at the boundary. Here we first discuss how 
to extend them to more complicated types. 

To deal with reference type at the boundary, i.e. location exchange, one needs 
to generalize the notion of traces, so that they can carry, for each action, a heap 
representing the values stored in the disclosed part of the heap, as in [23,27]. The 
extension to sum, recursive and empty types seems conceptually straightforward, 
by simply extending the definition of abstract values for these types, following 
the similar notion of ultimate pattern in [24]. The same idea should apply to 
allow continuation types at the boundary. Operational game semantics for an 
extension of HOS with polymorphism has been explored in [15]. 


Innocence On the other hand, all of the languages we considered were stateful. 
In the presence of state, all of the actions that are represented by labels (and 
their order and frequency) can be observed, because they could generate a side- 
effect. A natural question to ask whether the techniques could also be used 
to provide analogous theorems for purely functional computation, i.e. contexts 
taken from the language PCF. Here, the situation is different. For example, the 
terms f : Int > Int + f(0) and f : Int > Int H if f(0) f(0) f(0) should be 
equivalent, even though the sets of their traces are incomparable. 

It is known [12] that PCF strategies satisfy a uniformity condition called in- 
nocence. Unfortunately, restricting our traces to “O-innocent ones” (like we did 
with O-visibility and O-bracketing) would not deliver the required characteriza- 
tion. Technically, this is due to the fact that, in our arguments, given a single 
trace (with suitable properties), we can produce a context that induces the given 
trace and no other traces (except those implied by the definition of a trace). For 
innocence, this would not be possible due to the uniformity requirement. It will 
imply that, although we can find a functional context that generates an inno- 
cent trace, it might also generate other traces, which then have to be taken into 
account when considering contextual testing. This branching property makes it 
difficult to capture equivalence with respect to functional contexts explicitly, e.g. 
through traces, which is illustrated by the use of the so-called intrinsic quotient 
in game models of PCF [2,12]. 
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8 Related Work 


We have presented four operational game models for HOSC, which capture term 
interaction with contexts built from any of the four sublanguages x € {HOSC, 
GOSC, HOS, GOS} respectively. The most direct precursor to this work is 
Laird’s trace model for HOS[HOS] [23]. Other frameworks in this spirit include 
models for objects [18], aspects [16] and system-level code [9]. In [13], Laird’s 
model has been related formally to the denotational game model from [27]. How- 
ever, in general, it is not yet clear how one can move systematically between the 
operational and denotational game-based approaches, despite some promising 
steps reported in [25]. Below we mention other operational techniques for rea- 
soning about contextual equivalence. 

In [31], fully abstract Eager-Normal-Form (enf) Bisimulations are presented 
for an untyped )-calculus with store and control, similar to HOSC (but with 
control represented using the Ay/-calculus). The bisimulations are parameterised 
by worlds to model the evolution of store, and bisimulations on contexts are used 
to deal with control. Like our approach, they are based on symbolic evaluation of 
open terms. Typed enf-bisimulations, for a language without store and in control- 
passing style, have been introduced in [24]. Fully-abstract enf-bisimulations are 
presented in [7] for a language with state only, corresponding to an untyped 
version of HOS. Earlier works in this strand include [17,29]. 

Environmental Bisimulations [19,30,32] have also been introduced for lan- 
guages with store. They work on closed terms, computing the arguments that 
contexts can provide to terms using an environment similar to our component 
y. They have also been extended to languages with call/cc [34] and delimited 
control operators [5,6]. 

Kripke Logical Relations [28,4,8] have been introduced for languages with 
state and control. In [8], a characterization of contextual equivalence for each 
case x[x] (x € {HOSC, GOSC, HOS, GOS}) is given, using techniques called 
backtracking and public transitions, which exploit the absence of higher-order 
store and that of control constructs respectively. Importing these techniques in 
the setting of Kripke Open Bisimulations [14] should allow one to build a bridge 
between the game-semantics characterizations and Kripke Logical Relations. 

Parametric bisimulations [11] have been introduced as an operational tech- 
nique, merging ideas from Kripke Logical Relations and Environmental Bisim- 
ulations. They do not represent functional values coming from the environment 
using names, but instead use a notion of global and local knowledge to compute 
these values, reminiscent of the work on environmental bisimulations. The no- 
tion of global knowledge depends itself on a notion of evolving world. To our 
knowledge, no fully abstract Parametric Bisimulations have been presented. 

A general theory of applicative [21] and normal-form bisimulations [20] has 
been developed, with the goal of being modular with respect to the effects con- 
sidered. While the goal is similar to our work, the papers consider monadic and 
algebraic presentation of effects, trying particularly to design a general theory 
for proving soundness and completeness of such bisimulations. These works com- 
plement ours, and we would like to explore possible connections. 
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Abstract Compositional methods are central to the development and 
verification of software systems. They allow breaking down large systems 
into smaller components, while enabling reasoning about the behaviour 
of the composed system. For concurrent and communicating systems, 
compositional techniques based on behavioural type systems have received 
much attention. By abstracting communication protocols as types, these 
type systems can statically check that programs interact with channels 
according to a certain protocol, whether the intended messages are ex- 
changed in a certain order. In this paper, we put on our coalgebraic 
spectacles to investigate session types, a widely studied class of behavi- 
oural type systems. We provide a syntax-free description of session-based 
concurrency as states of coalgebras. As a result, we rediscover type equi- 
valence, duality, and subtyping relations in terms of canonical coinductive 
presentations. In turn, this coinductive presentation makes it possible to 
elegantly derive a decidable type system with subtyping for 7-calculus 
processes, in which the states of a coalgebra will serve as channel protocols. 
Going full circle, we exhibit a coalgebra structure on an existing session 
type system, and show that the relations and type system resulting from 
our coalgebraic perspective agree with the existing ones. 


Keywords: Session types - Coalgebra - Process calculi - Coinduction. 


1 Introduction 


Communication protocols enable interactions between humans and computers 
alike, yet different scientific communities rely on different descriptions of protocols: 
one community may use textual descriptions, another uses diagrams, and yet 
another may use types. There is then a mismatch, which is fruitful and hindering 
at the same time. Fruitful, because different views on protocols lead to different 
insights and technologies. But hindering, because exactly those insights and 
technologies cannot be easily exchanged. With this paper, we wish to provide a 
view of protocols that opens up new links between communities and that, at the 
same time, contributes new insights into the nature of communication protocols. 
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What would such a view of communication protocols be? Software systems 
typically consist of concurrent, interacting processes that pass messages over 
channels. Protocols are then a description of the possible exchanges on channels, 
without ever referring to the exact structure of the processes that use the 
channels. Since we may, for example, expect to get an answer only after sending 
a question, it is clear that such exchanges have to happen in an appropriate 
order. Therefore, protocols have to be a state-based abstraction of communication 
behaviour on channels. Because coalgebras provide an abstraction of general 
state-based behaviour, our proposed view of communication protocols becomes: 
model the states of a protocol as states of a coalgebra and let the coalgebra 
govern the exchanges that may happen at each state of the protocol. 


The above view of protocols allows us to model protocols as coalgebras. How- 
ever, protocols are usually not studied for the sake of their description but to 
achieve certain goals: ensuring correct composition of processes, comparing com- 
munication behaviour, or refining and abstracting protocols. Session types [19,20] 
are an approach to communication correctness for processes that pass messages 
along channels. The idea is simple: describe a protocol as a syntactic object (a 
type), and use a type system to statically verify that processes adhere to the 
protocol. This syntactic approach allows the automatic and efficient verification 
of many correctness properties. However, the syntactic approach depends on 
choosing one particular representation of protocols and one particular representa- 
tion of processes. We show in this paper that our coalgebraic view of protocols 
can guarantee correct process composition, and allows us to reason about key 
notions in the world of session types, type equivalence, duality and subtyping, 
while being completely independent of protocol and process representations. 


Our coalgebraic view is best understood by following the distillation process of 
ideas on a concrete session type system by Vasconcelos [37]. Consider the session 
type S = ?int. !bool. end, which specifies the protocol on one endpoint of a 
channel that receives an integer, then outputs a Boolean, and finally terminates 
the interaction. Note that the protocol S' specifies three different states: an input 
state, an output state, and a final state. Moreover, we note that S' specifies only 
how the channel is seen from one endpoint; the other endpoint needs to use 
the channel with the dual protocol !int. ?bool. end. Thus, session type systems 
ensure that the states of S are enabled only in the specified order and that the 
two channel endpoints implement dual protocols. 


A state-based reading of session types is intuitive and is already present 
in programming concepts such as typestates [15,32,33], theories of behavioural 
contracts [4,6,7,13], and connections between session types and communicat- 
ing automata [10,25]. The novelty and insight of the coalgebraic view is that 
1. it describes the state-based behaviour of protocols underlying session types, 
supporting unrestricted types and delegation, without adhering to any specific 
syntax or target programming model; 2. it offers a general framework in which 
key notions such as type equivalence, duality, and subtyping arise as instances of 
well-known coinductive constructions; and 3. it allows us to derive type systems 
for specific process languages, like the z-calculus. 
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Session Coalgebras at Work How does this coalgebraic view of protocols work 
for general session types? Consider a “mathematical server” that offers three 
operations to clients: integer multiplication, Boolean negation and quitting. The 
following session type T specifies a protocol to communicate with this server. 


mul: ?int. ?int. lint. X 
T = uX. & 4 neg: ?bool. !bool. X 
quit: end 


T is a recursive protocol, as indicated by “wX.”, which can be repeated. A client 
can choose, as indicated by &, between the three operations (mul, neg, and quit) 
and the protocol then continues with the corresponding actions. For instance, 
after choosing mul, the server requests two integers and, once received, promises 
to send an integer over the channel. We can see states of the protocol T emerging, 
and it remains to provide a coalgebraic view on the actions of the protocol to 
obtain what we will call session coalgebras. 


Figure 1. Protocol of the mathematical server as a session coalgebra 


Fig. 1 depicts a session coalgebra that describes protocol T. It consists of states 
qo,- --, 46, each representing a different state of T, and transitions between these 
states to model the evolution of T. Meaning is given to the different states and 
transitions through the labels on the states and transitions. The state labels, 
written in purple at top-left of the state name, indicate the branching type of 
that state. Depending on the branching type, the labels of the transitions bear 
different meanings. For instance, qo is labelled with “&”, which indicates that this 
state initiates an external choice. The labels on the three outgoing transitions for 
qo (mul, neg, quit) correspond then to the possible kinds of message for selecting 
one of the branches. Continuing, states q1,...,q5 are labelled with a request for 
data (label ?) or the sending of data (label !), and the outgoing transition labels 
indicate the type of the exchanged values (e.g., bool). Finally, state gg decrees 
the end of the protocol. Note that the cyclic character of T occurs as transitions 
back to qo; there is no need for an explicit operator to capture recursion. 
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D quit end 
S0 ————————————— S6 


Figure 2. Session coalgebra for the client view protocol the of mathematical server 


A session coalgebra models the view on one channel endpoint, but to correctly 
execute a protocol we also need to consider the dual session coalgebra that 
models the other endpoint’s view. In our example, the dual of Fig. 1 is given 
by the diagram in Fig. 2, which concerns states so, ..., Sse. More precisely, the 
states qi and s; are pairwise dual in the following sense. The external choice 
of qo becomes an internal choice for so, expressed through the label $, with 
exactly the same labels on the transitions leaving sọ. This means that whenever 
the server’s protocol is in state go and the client’s protocol in state so, then the 
client can choose to send one of the three signals to the server, thereby forcing 
the server protocol to advance to the corresponding state. All other states turn 
from sending states into receiving states and vice versa. We will see that this 
duality relation between states of session coalgebras has a natural coinductive 
description that can be obtained with the same techniques as bisimilarity. The 
duality relation for T will give us then the full picture of the intended protocol. 

Suppose a client who would only want to use multiplication once but could 
also handle real numbers as inputs. Such a client had to follow the protocol given 
by the session coalgebra in Fig. 3, with states ro,...,15. 


p mul ! int | int ? real ® quit end 
r —— n > nh — r3 —— T4} > s "5 


Figure 3. Session coalgebra that uses only part of a mathematical server 


In theories of session types, the protocol of Fig. 2 would be a subtype of this one 
(cf. [17,16]). Concretely, this new client can also follow the subtype protocol, and 
can thus communicate with a server following the protocol of Fig. 1. For session 
coalgebras, we recover the same notion of subtyping by using specific simulation 
relations that allow us to prove that the behaviour of rọ can be simulated by so. 
Together, simulations and duality provide the foundation of typical session type 
systems. 
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We have used thus far session types and coalgebras for protocols with simple 
control and with exchanges of simple data values. In contrast, rich session type 
systems can regulate session delegation, the dynamic allocation and exchange of 
channels by processes. Imagine a process that creates a channel, which should 
adhere to some protocol T. From an abstract perspective, the process holds both 
endpoints of the new channel, and has to send one of them to the process it wishes 
to communicate with. To ensure statically that the receiving process respects 
the protocol of this new channel, we need to announce this communication as a 
transmission of the session type T (via an existing channel) and use T to verify 
the receiving process. Session delegation adds expressiveness and flexibility, but 
may cause problems in the characterisation of a correct notion of duality [18]. 
Remarkably, our coalgebraic view of session types makes this characterisation 
completely natural. 


As an example, consider the type T = uX. ?X. X, which models a channel 
endpoint that infinitely often receives channel ends of its own type T. To obtain 
the dual of T, we may naively try to replace the receive with a send, which 
results in the type uX. 1X. X. The problem is that the two channel endpoints 
would not agree on the type they are sending or receiving, as any dual type of 
T needs to send messages of type T. Thus, the correct dual of T would be the 
type U = uX. !T. X. Both T and U specify the transmission of non-basic types, 
either the recursion variable X or T, in contrast to the mathematical server that 
merely stipulated the transmission of basic data values (integers or Booleans). 


In our session coalgebras for the mathematical server it sufficed to have simple 
data types and branching labels on transitions. However, to represent T and U 
we will need another mechanism to express session delegation. We observe that a 
transmission in session types consists of the transmitted data and the session 
type that the protocol must continue with afterwards. Thus, a transition out of a 
transmitting state in a session coalgebra encompasses both a data transition and 
a continuation transition. In diagrams of session coalgebras, we indicate the data 
transition by a coloured arrow ——> and an arrow - -- > connecting the data to 
the continuation transition. Using the combined transitions, Fig. 4 redraws the 
multiplication part of the mathematical server in Fig. 1. 


Figure 4. Protocol of mathematical server as session coalgebra 
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This way, the transition qı as has been replaced by both a data transition 
into a new state q and a continuation transition into q2. Moreover, q has been 
declared as a data state that expects an integer to be exchanged (label int). 

Having added these transitions to our toolbox, we can present the two types T 
and U as session coalgebras. The diagram in Fig. 5 shows such a session coalgebra, 
in which we name the states suggestively T and U. 


ess, os 
U — TÒ 


Figure 5. Session coalgebra for a recursive type T and its dual U 


Using this presentation as session coalgebras, it is now straightforward to 
coinductively prove that the states T and U are dual: 1. the states have opposite 
actions; 2. their data transitions point to equal types; and 3. their continuations 
are dual by coinduction. Clearly, the last step needs some justification but it will 
turn out that we can appeal to a standard definition of coinduction in terms of 
greatest fixed points. This demonstrates that our coalgebraic view on session 
types makes the definition of duality truly natural and straightforward. 

Up to here, we have discussed session types and coalgebras that are linear, 
i.e., they enforce that protocols complete exactly once. In many situations, one 
also needs unrestricted types, which enable sharing of channels between processes 
that access these channels concurrently. This is the case of a process that offers a 
service for other processes, for instance a web server. Session delegation allows us 
to create dynamically channels and check their protocols, but the shared channel 
for initiating a session [17] has to offer its protocol to an arbitrary number of 
clients. Unrestricted types enable us to specify these kind of service offers. 

As an example, consider a process that provides a channel for communicating 
integers to anyone asking, like a town hall official handing out citizen numbers. 
The type U’ = uX. un!int. X represents the corresponding protocol, where “un” 
qualifies the type !int. X as unrestricted. This allows the process holding the 
end of a channel with type U’ to transmit an integer to any process that is 
connected to the shared channel, without any restriction on their number. It is 
now surprisingly simple to express U’ in our coalgebraic view: we introduce a new 
state label “un” (unrestricted), which expresses that states reachable from this 
state can be used arbitrarily as protocols across different processes connecting to 
a channel that follow the protocol given by those states. The following diagram 
shows a session coalgebra with a state that corresponds to the type U’. 


un er T int 
U' 


q —— © 


aa e <0 Me 


Contributions and Related Work. In this paper, we introduce the notion of 
session coalgebra, which justifies the state-based behaviour of session types from 
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a coalgebraic perspective. This perspective is novel, although specific state-based 
description of protocols have been considered before [4,6,7,9,10,13,15,25,32,33]. 
Using coalgebra as a unifying framework for session types has two advantages: 
1. session coalgebras can be defined and studied independently from specific 
syntactic formulations, while keeping the operational behaviour of session types; 
and 2. we can uncover the innate coinductive nature of key notions in session 
types, such as duality, subtyping, and type equivalence through standard coal- 
gebraic techniques. In particular, although communicating automata can also 
provide syntax-independent characterisations of session types [10,11], such char- 
acterisations do not support delegation, an expressive feature which is cleanly 
justified in our coalgebraic approach. Coinduction already has been exploited in 
the definition of type equivalence [35], subtyping [17,16] and, especially, duality 
for systems with recursive types [3,18,24]. Unlike ours, these previous definitions 
are language-dependent, as they are tailored to specific process languages and/or 
syntactic variants of the type discipline. Session coalgebras enable thus the gen- 
eralisation of insights and technologies from specific languages to any protocol 
specification that fits under the umbrella of state-based sessions. 


To enable the verification of processes against protocols described by session 
coalgebras, we also contribute a type system for z-calculus processes, in which 
channel types are given by states of an arbitrary session coalgebra. Our type 
system revisits the one by Vasconcelos [37] from our coalgebraic perspective, while 
extending it with subtyping. Moreover, we provide a type checking algorithm for 
that system, provided that the underlying session coalgebra fulfils two intuitive 
conditions. In doing so, we show how a specific type syntax can be equipped with 
a session coalgebra structure and how the two decidability conditions are reflected 
in the type system. This is in contrast to starting with a specific type syntax and 
then employing category theoretical ideas [36], where coinductive session types 
are encoded in a session type system with parametric polymorphism [5]. Instead, 
we show how a session type system can be derived in general from coalgebras. 


Organisation Throughout the remaining paper we will turn the sketched ideas 
into a coalgebraic framework. We introduce in Sec. 2 a concrete session type 
syntax that we will use as illustration of our framework. In Sec. 3, we will define 
session coalgebras as coalgebras for an appropriate functor and show that the 
type system from Sec. 2 can be equipped with a coalgebraic structure. The 
promised coinductive view on type equivalence, duality, subtyping, etc. will be 
provided in Sec. 4. Moreover, we will show that these notions are decidable under 
certain conditions that hold for any reasonable session type syntax, including 
the one from Sec. 2. Up to that point, the session coalgebras only had intrinsic 
meaning and were not associated to any process representation. Section 5 sets 
forth a type system for z-calculus, in which channels are assigned states of a 
session coalgebra as types. The resulting type system features subtyping and 
algorithmic type checking, presented in Sec. 6. Some final thoughts are gathered 
in Sec. 7. An extended version, available online, collects additional material [22]. 
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p=? T T :=dE€ D 
| T.T | end 
| &{li: Tijer | ap 
| {l Ti jier | X & Var 
| uX.T 
q ::= lin | un 


Figure 6. Session types over sets of basic data types D and of variables Var 


2 Session Types 


To motivate the development of session coalgebras, we recall in this section the 
concrete syntax of an existing session type system by Vasconcelos [37]. After 
building up our intuition, we introduce session coalgebras in Sec. 3 to show they 
can represent this concrete type system. 

The types of the system that we will be using are generated by the grammar in 
Fig. 6, relative to a set of basic data types D and a countable set of type variables 
Var. This grammar has three syntactic categories: pretypes p, qualifiers q, and 
session types T. A pretype p is simply a communication action: send (!), receive 
(?), external choice (&), and internal choice (@) indexed by a finite sets I of 
labels, followed by one or multiple session types. The simplest session types are 
basic data types in D and the completed, or terminated, protocol represented 
by end. A pretype and qualifier also form a session type, written as q p. The 
“lin” qualifier enforces that the communication action p has to be carried out 
exactly once, while the “un” qualifier allows arbitrary use of p. Finally, we can 
form recursive session types with the the fixed point operator u and the use 
of type variables. We use the usual notion of a-equivalence, (capture-avoiding) 
substitution, and free and bound types variables for session types. 

The grammar allows arbitrary recursive types. We let Type be the set of all 
T in which recursive types are contractive and closed, which means that they 
contain no substrings of the form 4X 1.uX2...uX,-X; and no free type variables. 

To lighten up notation, we will usually omit the qualifier lin and assume every 
type to finalise with end. With these conventions, we write, e.g., ?int. instead 
of lin lint. end and un ?int. for a single unrestricted read. 

We assume there is some decidable subtyping preorder <p over the basic 
types. A type is a subtype of another if the subtype can be used anywhere where 
the supertype was accepted. In examples, we use the basic types int, real and 
bool, and we assume that int is a subtype of real, as usual. 

An important notion is the unfolding of a session type, which we define next: 


Definition 1 (Unfolding). The unfolding of a recursive type uX.T is defined 
recursively 


unfold(uX.T) = unfold (T|uX.T/X]) 
For all other T in Type, unfold is the identity: unfold(T) = T. 
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Because we assume that types are contractive, unfold(T) terminates for all T. 
Also, because all types are required to be closed, wnfold(T) can never be a variable 
X. Any such variable would have to be bound somewhere before use, meaning it 
would have been substituted. Furthermore, unfolding a closed type always yields 
another closed type, as each removed binder always causes a substitution of the 
bound variable. 


3 Session Coalgebra 


Here we will discuss session coalgebras, the main contribution of this paper. The 
idea is that session coalgebras will be coalgebras for a specific functor F, which 
will capture the state labels and the various kinds of transitions that we discussed 
in Sec. 1. An important feature of coalgebras in general, and session coalgebras 
in particular, is that the states can be given by an arbitrary set. We will leverage 
this to define a session coalgebra on the set of types Type introduced in Sec. 2. 

Before coming to the definition, let us briefly recall some minimal notions of 
category theory. We will not require a lot of category theoretical terminology; 
in fact, we will only use the category Set of sets and functions. Moreover, we 
will be dealing with functors F: Set — Set on the category Set. Such a functor 
allows us to map a set X to a set F(X), and functions f : X + Y to functions 
F(f): F(X) > F(Y). To be meaningful, a functor must preserve identity and 
compositions. That is, F maps the identity function idx: X — X on X to 
the identity on F(X): F(idx) = idpcx); and, given functions f: X —> Y and 
g: Y > Z, we must have F(go f) = F(g) 0 F(f). 

A central notion is that of the coalgebras for a functor F. A coalgebra is given 
by a pair (X,c) of a set X and a function c: X > F(X). For simplicity, we often 
leave out X and refer to c as the coalgebra. The general idea is that the set X is 
the set of states and that c assigns to every state its one-step behaviour. In the 
case of session coalgebras this will be the state labels and outgoing transitions. 
Given two coalgebras c: X > F(X) and d: Y > F(Y), we say that h: X > Y 
is a homomorphism, if doh = F (h) o c. Coalgebras and their homomorphisms 
form a category, with the same identity maps and composition as in Set. 

We will have to analyse subsets of coalgebras that are closed under transitions. 
Given a coalgebra c: X => F(X), we say that d: Y > F(Y) with Y C X isa 
subcoalgebra of c if the inclusion map Y > X is a coalgebra homomorphism. 
Note that in this case c(Y) C F(Y) and thus d is the restriction of c to Y. Hence, 
we also refer to Y as subcoalgebra. The subcoalgebra generated by x € X in c, 
denoted by (æ})e, is the least subset of X that contains x and is a subcoalgebra 
of c. Intuitively, it is the set of x and all states that are reachable from zx. 

Coming to the concrete case of session coalgebras, we now construct a functor 
that allows us to capture the state labels and the different kinds of transitions. 
Keeping in mind that states of a session coalgebra correspond to states of a 
protocol, we need to be able to label the states with enabled operations. 


Definition 2 (Operations and Polarities). The operation of a state describes 
the action it represents: com marks the transmission (sending or receiving) of a 
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value; branch marks an (internal or external) choice; end marks the completed 
protocol; bsc marks a basic data type; and un marks an unrestricted type. States 
that transmit data (labelled with com) or allow for choice (labelled with branch) 
also have a polarity, which can be either in (a receiving action or external choice) 
or out (a sending action or internal choice). We let O be the set of all operations 
O = {com, branch, end, bsc, un} and P the set of polarities P = {in, out}. 


Note that pairs in {com, branch} x P directly correspond to the actions 
of a session type: ? = (com,in), ! = (com,out), & = (branch,in) and @ = 
(branch, out). We will be using these markers to abbreviate the pairs. 

Now that we have the possible operations of a protocol, we need to define the 
transitions that may follow each operation. Recall that the transition at a choice 
state has to be labelled with messages that resolve that choice. We therefore 
assume to be given a set L of possible choice labels. The variable l will be used 
to refer to an element of L. Pep, (L) is the set of all finite, non-empty, subsets of 
L. Variables L, Lı, L2,... refer to these finite, non-empty subsets of L. 

Our goal is to define a polynomial functor [14] that captures the states labels 
and transitions. This requires some further formal language. First, we let * and 
d be some fixed, distinct, objects. Second, given sets X and Y, we denote by XY 
the set of all (total) functions from Y to X. Finally, given a family of sets (X;}ier 
indexed by some set I, their coproduct is the set J] J;e; Xi = { (i, x) |i € I,a € Xj}. 

We are now ready to define session coalgebras: 


Definition 3 (Session Coalgebras). Let A and B be sets defined as follows, 
where we recall that D is the set of all basic data types. 


A= {com} x P Boom,p = {*, a} 
U {branch} x Px Piy (L) Bopranch,p,L =L 
U {end} Bena = 0 
U {bsc} x D Brsca = 
U {un} Bun = {*} 


The polynomial functor F : Set — Set is defined by 


F(X) = [[ x7 


acA 
F(f)(a,9) = (a, fog) 
A coalgebra (X,c) for the functor F is called a session coalgebra. 


Let us unfold this definition. Given a session coalgebra c: X > F(X) anda 
state x € X, we find in c(x) € F(X) the information of x encoded as a tuple 
(a, f) with a € A and f: Ba > X. From a, we get directly the operation, and 
the polarity for com states, the type of values communicated for bsc states or 
the message labels of branch states. The function f encodes the transitions out 
of x. The domain of f is exactly the set of labels that have a transition, and is 
dependent on the kind of state declared by a. 

It is convenient to partition the domain of the transition map f into data and 
continuations. Notice how only com states have data transitions, for other states, 
all transitions are continuations. As usual, we write dom(f) for the domain of f. 
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Definition 4 (Domains). Suppose c(x) = (com, p, f), then the data domain 
of f is domp(f) = {d} and the continuation domain is domc(f) = {*}. In all 
other cases, domp(f) =Ù and domc(f) = dom(f). 


3.1 Alternative Presentation of Session Coalgebras 


Session coalgebras (X,c) are rather complex. We show how to build up c as the 
combination of two simpler functions, denoted o and ô, so that c(x) = (a(x), 6(x)) 
with o: X + A and 6(x): Boz) + X. Observe that every state gets an operation 
in O assigned, thus we may assume that there is a map op: X —> O. Depending 
on the operation given by op(x), the label on x will then have different other 
ingredients that are captured in the following proposition. 

To formulate the proposition, we need some notation. Suppose f: X > Iisa 
map and i € I. We define the fibre Xf of f over i to be Xf = {x € X | f(x) = i}. 
Moreover, we let the pairing of functions f and g be (f,g)(x) = (f(x), g(2)). 


Proposition 1. A session coalgebra (X,c) can equivalently be expressed by 
providing the following maps: 


op: X +O maps each state to an operation 
pol: XPa + X fnna > P maps com and branch states to a polarity 
la: XP ach 2 PZ, (L) maps branch states to a set of labels 
da: X$, + D maps bsc states to their basic type 
ba: XE + XFo maps each state to a transition function, 
where 
(op, po) (x) if op(a) = com 
(op, pol, la)(x) if op(a) = branch 
a(x) = 
(op, da) (x) if op(x) = bse 
op(x) if op(x) = end or op(x) = un 


We specified 6, as a family of transition functions to preserve each specific 
signature. We can define a single global transition function as 6(@) = s(x) (x). This 
is how the coalgebra finally becomes c(z) = (a(x), d(a)). As long as the provided 
maps fit their signatures, this derived function will conform to c : X > F(X). 

The procedure also works backwards: given any session coalgebra, we can 
derive functions op(x), pol(x), etc. from c(x). We will often use op(x), o(#), and 
(x) to refer to those specific parts of an arbitrary session coalgebra. 


3.2 Coalgebra of Session Types 


In Sec. 1, we informally explained how session types can be represented as states 
of a session coalgebra. We will now justify this claim by showing that session 
types are, in fact, states of a specific session coalgebra (Type, Ctype)- 
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We define the functions op, pol, ô, and la (see Prop. 1) on Type. Using Prop. 1, 
we can then derive ctype : Type + F (Type). Let us begin with the linear types. 


T CType(T) 
op(T) |pol(T) ô(T) la(T) 
lin ?T. T’ in | 6(T)(*) = T 
inir. r | | out ODA) =T 
Mas T BET eanl SITL) =T; Hli Jier 
ling{l; : Tihier out 


Under this definition, la(T) is indeed finite, by virtue of an expression being 
a finite string. The completed protocol end and basic types d are straightfor- 
ward: c(end) = (end) and c(d) = (bsc, d) for any d € D. Recursive types are 
handled according to their unfolding, c(uX. T) = c(unfold (uX. T)). Recall that 
contractivity ensures that unfold always terminates. As our types are closed, all 
recursion variables are substituted during the unfolding of their binder. Con- 
sequently, we do not need to define c on these variables. Also note that this 
definition results in an equi-recursive interpretation of recursive types. 

Session types can also be unrestricted, and consist of a pretype p with a 
qualifier un. Session coalgebras have un states to mark unrestricted types; the 
continuation describes what the actual interaction is. Thus, we define op(un p) = 
un and d(un p)(*) = lin p. 


Remark 1 (Alternative Syntazes and their Functors). The unrestricted session 
types that we have adopted are fairly standard, but they are not the only ones 
in the literature. Most notably, Gay and Hole [17] defined a type (T1, ..., Th] 
that allows infinite reading and writing. To allow for such behaviour in session 
coalgebra, we can change Bun to a set of two elements, such a {*,,*2}. Like 
internal choice, the two transitions describe an option of which behaviour to 
follow, but without sending synchronisation signals. One transition could go to a 
read, and the other to a write, both recursively continuing as the original type 
resa lls 

It is possible, although not entirely trivial, to change the further definitions 
appropriately and get a decidable type checking algorithm encompassing both 
the syntax presented in this work, and Gay and Hole’s syntax. We choose not to, 
so to keep the presentation simpler. 


4 Type Equivalence, Duality and Subtyping 


Up to here, we have represented session types as session coalgebras, but we have 
not yet given a precise semantics to them. As a first step, we will define three 
relations on states: bisimulation, duality, and simulation. Bisimulation is also 
called behavioural equivalence for types; we will show that bisimilar types are 
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indeed equivalent. Duality specifies complementary types: it tells us which types 
can form a correct interaction. Simulation will provide a notion of subtyping: 
it tells us when a type can be used where another type was expected. Besides 
relations on session coalgebras, we also introduce the parallelizability of states 
that allows us to rule out certain troubling unrestricted types. Finally, we will 
obtain conditions on coalgebras to ensure the decidability of the three relations 
and therefore the type system that we derive in Sec. 5. 

In the following, we will denote by Rely the poset P(X x X) of all relations 
on X ordered by inclusion. Recall that a post-fixpoint of a monotone map 
g: Rely — Rely is a relation R € Rely with R C g(R). Note that Relx is a 
complete lattice and that therefore any monotone map has a greatest post-fixpoint 
by the Knaster-Tarski Theorem [34]. We will define bisimulation, simulation, 
and duality as the greatest (post-)fixpoint of monotone functions, which we will 
therefore call coinductive definitions. This definition turns out to be intuitively 
what we would expect and the interaction of infinite behaviour with other 
type features is automatically correct. The coinductive definitions also give us 
immediately proof techniques for equivalence, duality and subtyping: to show 
that two states are, say, dual we only have to establish a relation that contains 
both states and show that the relation is a post-fixpoint. This technique can 
then be improved in various ways [30] and we will show that it is decidable for 
reasonable session coalgebras. 


4.1 Bisimulation 


Two states of a coalgebra are said to be bisimilar if they exhibit equivalent 
behaviour. We abstract away from the precise structure of a coalgebra and only 
consider its observable behaviour. Two states are bisimilar if their labels are 
equal and if the states at the end of matching transitions are again bisimilar. 
There is one exception to the equality of labels: basic types can be related via 
their pre-order, which does not have to coincide with equality. 

Fix some coalgebra (X, c) and let c* : Relp(x) > Relx be the binary preimage 


of c defined as 
(R) = { (x,y) | (c(x), e(y)) € R}. 


Definition 5. We define the function f~ : Relx > Relp(x) as 
f~(R) = { (la, f), (a, f')) | Va € dom(f)) fla) R F'(a)} 
U { ((bsc, d, fø), (bse, d', fo)) |d <p d Ad <p d} 
where fo: 0 — X is the empty function. 


It can be easily checked that, both, c* and f. are monotone maps and thus also 
their composition. Thus, the greatest fixpoint in the following definition exists. 


Definition 6. A relation R is called a bisimulation if it is a post-fixpoint of 
c* o f~. We call the greatest fixpoint bisimilarity and denote it by ~. 
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4.2 Duality 


Duality describes exactly opposite types in terms of their polarity. That is, the 
dual of input is output and the dual of output is input: in = out and out = in. 
We can extend this to tuples a in A, see Def. 3, with the exception of basic types 
because they do not describe channels: 


(com, p) = (com, P) (end) = (end) 
(branch, p, L) = (branch, P, L) un) = (un) 


(bsc, d) is undefined 


The next step is to compare transitions. Continuations of domc(f) need to be 
dual. The data types that are sent or received need to be equivalent, hence 
transitions of domp(f) need to go to bisimilar states. We capture this idea with 
the monotone map f, : Rely — Relp(x) defined as follows. 


E z p |Va € dome(f)) f(a) R f'(a) and 
= {enh ea TPA 


Definition 7. A relation R is called a duality relation if it is a post-fixpoint of 
c* o fı. We call the greatest fixpoint duality and denote it by L. 


It is useful to have a function mapping any x € X to their dual 7, as long 
as duality is defined on x. However, even if duality is defined on g, the dual 
state might not be in X. Thus, we define the dual closure of X as the set 
X+ = X U{z | a(x) is defined}, where Z is understood to be an arbitrary state 
not in X and distinct from y for any states y € X with x Æ y. For any of the 


original states, c+ (x) = c(x), but for the new states we define ot (T) = o(x) and 


5 
5 


\(a) = ô(x)(a) for all a € domc(f), and 
)(8) = 6(x)(8) forall 6 € domp(f) 


Thus, the dual closure is a coalgebra such that x L zx for any T. Notice that 
taking a dual twice always yields a bisimilar type, so we can define the duality 
function as an involution, © = x, rather than adding more variables. Clearly, the 
dual closure of a finite set is finite. 


Proposition 2. x L & for every state x such that & is defined. 


4.3 Parallelizability 


Unlike a linear endpoint, a channel endpoint with an unrestricted type may be 
shared between different parallel processes; each of them uses it independently, 
without informing the others. Furthermore, there is no way to coordinate which 
process receives which message. If the unrestricted endpoint sends a message, it 
could be read by a process that just started using the channel, or by a process that 
is almost done using the channel, or by a process that is anywhere in between. 
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In practice, this means an unrestricted channel can only perform one kind of 
communication action. However, session coalgebras allow us to define arbitrarily 
complex unrestricted types. For example, uX. un?int. un?bool. X is an element 
of Type, but we know that sending both int and bool over the same unrestricted 
channel causes problems. 


Definition 8. Given a coalgebra (X,c), some subset Y C X is parallelizable, 
written par(Y), if for every x and y in Y one of the following holds: « ~ y, 
o(“) = un, or o(y) = un. 


We know that un states do not represent communications; any other states, 
though, have to represent the same kind of action. We make this slightly stronger 
by requiring they are pairwise bisimilar. 

Often we are interested in the parallelizability only of a specific state. Recall 
that (x). denotes the subcoalgebra generated by x € X in c. 


Definition 9. Let (x)? be the smallest subset of (x). that contains x and is 
closed under continuation transitions: 


(ge = QY CX]|xeY and d(y)(a) €Y for ally € Y and a € domc(d(y)) } 


A state x is parallelizable, written par(x), if (x)? is parallelizable. 


4.4 Simulation and Subtyping 


bbs 


Intuitively, a coalgebra simulates another if the behaviour of the latter “is 
contained in” the former. Subtyping, originally defined on session types by Gay 
and Hole [17], is a notion of substitutability of types [16]. We will define our 
notion of simulation such that it coincides with subtyping, just like bisimulation 
provides a notion of type equivalence [17]. 

Consider a process that expects a channel of type T = ?real. The process 
reads a value, and expects it to be a real number and treat it as such. We defined 
int as a subtype of real, so the process can operate correctly if it receives an 
integer instead; that is, ?int is a subtype of T. Now consider a process that 
expects a channel of type !int, on which it can send any integer. In this case 
we cannot restrict the channel to a subtype: as all integers are valid where real 
numbers are expected, we can generalise the channel type to !real. 

Now, in the input case the session types are related (in the subtyping relation) 
in the same order as the data types; this is called covariance. For output, the 
order is reversed; this is called contravariance. The same idea holds for labelled 
choices: the subtype of an external choice can have a subset of choices, while 
the subtype of an internal choice can add more options. For all types, it holds 
that states reached through transitions are covariant, i.e., if T is a subtype of U, 
continuations of T must be subtypes of continuations (of the same label) of U. 
The monotone map hc in Fig. 7 captures these ideas formally. 


Definition 10. A relation R is called a simulation if it is a post-fixpoint of 
cœ* oho. We call the greatest fixpoint similarity and denote it by C. 
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{ ((com, in, f), (com, in, g)) 

U { ((com, out, f), (com, out, g)) 
{ ((branch, in, Li, f), 
(branch, in, L2, g)) 
U { ((branch, out, La, F), 
(branch, out, L2, g)) 


((bsc, d, fo), (bsc, d’, fo)) 


U 


f(*) R g(*) and f(d) R g(d) } 
fŒ) R g(*) and g(d) R f(d) } 
Lı C Lg and VL E Lı. f(t) Rg(l) } 


Lz C Lı and VLE Le. f(t) R gil) } 
d<p d} 


C 


{ 
U { ((end, fo), (end, fo)) } 
{ 


((un, f), (un, g)) f(*) R g(*), and par(f(*)) iff par(g(*)) } 


Figure 7. Monotone map hc: Relx — Rely:x) that defines simulations 
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Figure 8. Simulation for two mathematical server clients (indicated by dotted arrows) 


Let us illustrate similarity by means of an example. 


Example 1. Recall the two client protocols for our mathematical server in Figs. 2 
and 3. We can now prove our claim that the latter can also connect to the server 
because it is a supertype of the client protocol in Fig. 2. To do that, we have 
to establish a simulation relation between the states of both client protocols. In 
Fig. 8, we display a part of both session coalgebras side-by-side and indicate with 
dotted arrows the pairs that have to be related by a simulation relation to show 
that these states are similar, that is, related by E. It should be noted that we 
simulate states from the second coalgebra by that of the first, that is, we show 
Sk E rx for the shown states. There is one exception to this, namely qint E qrea1- 


The following proposition records some properties of and tight connections 
between the relations that we introduced. 


Proposition 3. Bisimilarity ~ is an equivalence relation, duality L is symmet- 
ric, and similarity E is a preorder. Moreover, for all states x, y, and z of a 
session coalgebra, we have that 


lary iffeCy andyla; 
2. x Ly andx Lz implies y ~ z; and 
3. x Ly andy ~z impliesx lz. 
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4.5 Decidability 


In a practical type checker, we need an algorithm to decide the relations defined 
above. In this subsection we show an algorithm that computes the answer in 
finite time for a certain class of types. 


Definition 11. A coalgebra c is finitely generated if (x). is finite for all x. 


This restriction is not problematic for types, as the following lemma shows. 
Lemma 1. The coalgebra of types (Type, ctype) is finitely generated. 


To determine whether two states x and y are bisimilar, we need to determine 
if there exists a bisimulation R with x R y. We start with the simplest relation 
R= {(a,y)}, and ask if this is a bisimulation. 

First, we check that for all (u,w) € R, o(u) = o(w), or in the case of bsc 
states that da(u) <p da(w) and da(w) <p da(u). If o(u) 4 o(w) for any pair in 
R we know that no superset of R is a bisimulation, and the algorithm rejects. 

Second, we check the matching transitions. For every (u,w) € R anda € 
dom(ô(u)) we check whether (6(u)(a@), 6(w)(q@)) € R. If we encounter a missing 
pair, we add it to R and ask whether this new relation is a bisimulation, i.e., 
return to the first step. If all destinations for matching transitions are present in 
R, then R is, by construction, a bisimulation containing (x,y). Hence, x ~ y. 

This algorithm tries to construct the smallest possible bisimulation containing 
(x,y), by only adding strictly necessary pairs. If the algorithm rejects, there is 
no such bisimulation; hence, x % y. Additionally, the algorithm only examines 
pairs in (x). X (y)c¢. If there are finitely many of such pairs, the algorithm will 
terminate in finite time 

The above described algorithm can be suitably adapted to similarity and 
duality, which gives us the following result. 


Theorem 1. Bisimilarity, similarity, and duality of any states x and y are 
decidable if (x). and (y). are finite. Parallelizability of any state x is decidable if 
(x)? (Definition 9) is finite. 


Corollary 1. Bisimilarity, similarity, and duality are decidable for ctype. 


5 Typing Rules 


Session types are meant to discipline the behaviour of the channels of an interact- 
ing process, so as to ensure that prescribed protocols are executed as intended. 
Up to here, we have focused on session types (i.e., their representation as session 
coalgebras and coinductively-defined relations on them) without committing to a 
specific syntax for processes. This choice is on purpose: our goal is to provide 
a truly syntax-independent justification for session types. In this section, we 
introduce a syntactic notion of processes and rely on session coalgebras to define 
the typing rules for a session type system. 
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P,Q i= TUP output y on channel x 
x(y).P bind input from channel z to variable y 
xb {li : Pijer offer choices l4, l2,... 
xz<l.P make choice | 
P|Q composition 
IP replication 
0 finished process 
(vxy)P channel creation 


Figure 9. Process syntax 


5.1 A Session 7-calculus 


The z-calculus is a formal model of interactive computation in which processes 
exchange messages along channels (or names) [26,31]. As such, it is an abstract 
framework in which key features such as name mobility, (message-passing) con- 
currency, non-determinism, synchronous communication, and infinite behaviour 
have rigorous syntactic representations and precise operational meaning. We 
consider a session 7-calculus based on [37,17], i.e., a variant of the 7-calculus 
whose operators are tailored to the protocols expressed by session types. 

We assume base sets of variables (x,y, z,...) and values (v,v’,...), which can 
be variables or the Boolean constants (true and false). There is also a set of labels 
L, ranged over by 1, /’,.... The syntax of processes (P,Q,...) is given by the 
grammar in Fig. 9. We discuss the salient aspects of the syntax. A process z(y}. P 
denotes the output of channel y along channel x, which precedes the execution 
of P. Dually, a process x(y).P denotes the input of a value v along channel z, 
which precedes the execution of process P[v/y], i.e., the process P in which all 
free occurrences of y have been substituted by v. Processes x > {l; : Pj}ier and 
x<il.P implement a labelled choice mechanism. Given a finite index set J, process 
zD {li : Pi}ier, known as branching, denotes an external choice: the reception of 
a label l; (with j € I) along channel x precedes the execution of the continuation 
Pj. Process x <1/.P, known as selection, denotes an internal choice; it is meant to 
interact with a complementary branching. Given processes P and Q, process P | Q 
denotes their parallel composition, which enables their simultaneous execution. 
The process !P, the replication of P, denotes the composition of infinite copies of 
P running in parallel, i.e., P | P | ---. Process 0 denotes inaction. Finally, process 
(vxy)P is arguably the main difference with respect to usual presentations of the 
m-calculus, and denotes a restriction operator that declares x and y as covariables, 
i.e., as complementary endpoints of the same channel, with scope P. 

The operational semantics for processes is defined as a reduction relation 
denoted —>, by relying on a notion of structural congruence on processes, denoted 
=. Figure 10 defines these two notions. Intuitively, two processes are structurally 
congruent if they are identical in behaviour, but not necessarily in structure. It is 
the smallest congruence relation satisfying the axioms in Fig. 10 (bottom). We say 
a process P reduces to Q, written P —» Q, when there is a single execution step 
yielding Q from P. We comment on the rules in Fig. 10 (top). R-COM formalizes 
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Reduction 

(vey) (Elo):P | y(2).Q | R) — (vay)(P | Qle/AIR) [R-con 

(vay)(a < l;j.P | y > {li : Qi}ier | R) — (vey)(P | Q; | R) (Qj ETI) [R-SYNC] 
P— Q P— Q 


(vay)P — (vey) P|R—>Q|R [R-RES][R-PAR] 


P=P P-+Q Q=Q 
P’ — Q 


[R-CONG] 


Structural congruence 
Parallel composition: 

P|\Q=Q|P (P|Q)|R=P|(Q|R) P\|\O=P IP= P |P 
Scope restriction: 

(vzy)(vvw)P = (vow) (vry)P (vzy)0 = 0 (vay) P = (vyx)P 

(vey)(P | Q) = ((vry)P) | Q if x and y not free in Q 


Figure 10. Reduction semantics 


the exchange a value over a channel formed by two covariables. Similarly, R-SYNC 
formalises the synchronisation between a branching and a selection that realises 
the labelled choice. Rules R-RES and R-PAR are contextual rules, which allow 
reduction to proceed under restriction and parallel composition. Finally, Rule 
R-CONG says that reduction is closed under structurally congruence: we can use 
= to promote interactions that match the structure of the rules above. 


5.2 Typing Rules 


Based on the above, variables P, Q will refer to processes, x,y,z will range over 
channels and T,U,V are states of some fixed, but arbitrary, session coalgebra 
(X,c). Variables are associated with these states in a context I’, as described 
by I :=@|I,c2:T . A context is an unordered, finite set of pairs, that may 
have at most one pair (x, T) for each variable x. A context is thus isomorphic 
to a (partial) function from a finite set of variables to their types. We use I’ to 
denote this isomorphic function as well: r(x) = T if (x,T) € I. The domain of 
a context is defined accordingly. 
We know ‘un’ types are unrestricted, but they are not the only ones. 


Definition 12. A type is unrestricted, written un(T), if its operation is un, end 
or bsc. A context is unrestricted, written un(I), if all types in I are unrestricted, 
i.e., if (x,T) € I implies un(T). A type is linear, written lin(T), if it is not 
unrestricted. A context is linear, if all its types are linear. 


A context I’ may be split into two parts I, and I>, such that the linear types 
are strictly divided between I and I>, but unrestricted types can be copied. 
Context split is a ternary relation, defined by the axioms in Fig. 11. We may 
write I, o Ty to refer to a context I’ for which I = I, o Iù is in the context 
split relation. Such a context is not necessarily defined for any given contexts; 
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_ T=T,0TI» un(T) 
Mapep T,x:T = (I,x:T)o (I2,£: T) 
Ir = To Iù r= Tio T> 
La:T= (Tr T)o Iz I,x:T =T10(I2,x:T) 
Figure 11. Context Split 
un(I’) T,x:T,y:UF P TY 7 
TEO TE oey) [T-Inact][T-REs 
DEP DEQ PEP w(P) 
Toh P/Q TrIP [T-PaR] [TREP 
anjona Detay NOCY a 
I,x:T a(y).P 
c(T) = (!, f) I, a: f(*) FP UC f(d) 
I,x:T,y:U F zly).P [toya 
c(T) = (&, Ln, f) Lı C Le T, x: f(O) F P vle Ly 
Ty TA Pos {l i P, piero [T-BRANOH 
c(T) = (9, L, f) T, x: f(D) F P, leL 
Te:Tra <P, ee 
c(T) = (un, f) par(T) T,x: f(x) P [T-UNPACK 


Tz: TFP 


Figure 12. Declarative Typing Rules 


we implicitly assume its existence when writing I o I. Notice that the use 
of Ix : T in the third rule of Fig. 11 carries the assumption that x not in I’. 
Otherwise, I’, x : T would have two pairs with x, which is not allowed. 

The type system is defined by the rules in Fig. 12. A process P is well-formed, 
under a context I’, if there is some inference tree whose root is l'H P and whose 
nodes are all valid instantiations of these type rules. As T-INACT is the only rule 
that does not depend on the correctness of another process, it forms the leaves 
of such trees. For well-formed processes, the type system guarantees that: 


— If the process terminates, then all linear sessions were completed. 

— If a process reads a value from a channel, the value has the type specified by 
the channel’s session type. If a process receives a label, it is one of the labels 
specified by the channel’s session type. 


We discuss the typing rules, which can be conveniently read keeping in mind 
the notations introduced in Def. 3 and Prop. 1. T-INACT ensures that all linear 
channels in the context are interacted with until the type becomes unrestricted. If 
our context contains a variable x of type ?int, then the process is required to read 
an int from it. Thus, x : ?int. ¥ 0. In contrast, process 1(z).0 is well-formed 
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for the same context, using T-INACT and T-IN: 


x:end,z: intl 0 
x: tint F x(z).0 


T-RES creates a channel by binding together two covariables x and y, of dual 
type. T-PAR causes unrestricted channels to be copied and linear channels to 
get split between composite processes, ensuring the latter occur in only a single 
process. Recall that replication !P is an infinite composition of a single process P, 
hence, a replicated process can only use unrestricted channels. Together, T-PAR 
and T-REs allow us to introduce new covariables, with new types, and distribute 
them. But, only unrestricted types may be copied. Notice that a process does 
not specify which types to give the newly bound variables. 


v:int H (vay) x(z).0 | 7(v).0 
x:un?int F  2(z).0| a(z).0 
x:?int ¥ 2(z).0| 2(z).0 


Each action on a channel has its own rule: T-IN handles input, binding the 
channel x to the continuation type and y to some supertype of the received type. 
T-OutT handles output, which requires the sent variable to have a subtype of 
whatever type the channel expects to send. T-BRANCH handles external choice, 
where the process needs to offer at least all choices the type describes, coupled 
with processes that are correctly typed under the respective continuation types. 
T-SEL only checks whether the single label that was chosen by the process was a 
valid option, and if the rest of the process is correct under the continuation type. 

These rules are only specified for linear states; T-UNPACK allows a un state 
to be used as if it was the underlying type, as long as it is parallelizable (Def. 8). 

We can actually create structures with un that do not have a syntactical 
equivalent. For example, let Tena be a state with o(Tena) = un and 0(Tena)(*) = 
Tena. Just like regular end, Teng allows no interactions on the channel, but it 
does not cause a “un” type to be unparallelizable. 


un 4 un 
or — 
T qı 7 q2 >) 


ra 
4 


int 
q 


Figure 13. Session coalgebra using an alternative completed protocol 


The diagram in Fig. 13 describes a parallelizable unrestricted state T such that 
each copy of a channel in state T can only do a single receive. However, because 
it is unrestricted, we can still copy the channel across threads and read a value 
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per copy. We can even read infinitely many values through replication. 


x:T KF a(yr).x(y2).x(y3).0 
x:T F  a(y1).0 | x(y2).0 | x(y3).0 
x:T F '(a(y).0) 


Such a type might be interesting in combination with session delegation. A linear 
session could be established by receiving a channel from an unrestricted channel. 
By using a structure like T, each thread is guaranteed to establish at most one 
private session, but there can be many of such sessions in parallel threads. 

In Sec. 4, we defined simulation through the intuition of subtyping as sub- 
stitutability in one direction. We see that substitution is indeed allowed for 
simulated types. 


Theorem 2. The following, more common, rule is admissible from the rules in 
Fig. 12. 
BrTP UCT 
Ir æz:UFP 


That is, we could add the rule as an axiom, without changing the set of typable 
processes. As a corollary, bisimulation of states implies the states are equivalent 
with respect to the type system. 


Corollary 2. For all bisimilar types T ~ U, contexts I and processes P, it 
holds that T,x:T F P if and only if T,x: U F P. 


6 Algorithmic Type Checking 


The type rules describe what well-formed processes look like, but do not directly 
allow us to decide whether an arbitrary process is well-formed or not. This is 
because, beforehand, we do not know: 


1. Which type to introduce in reading (T-IN) or scope restriction (T-REs), or 
2. How to split the context in composite processes (T-PAR). 


Rather than trying to infer the introduced types, we augment the language 
of processes with type annotations: 


P ::= ... | (vzy: T) P | z(y: T).P 


We only need to annotate one type for scope restrictions, as we can create the 
other with the duality function. Other productions are kept unchanged. 

When checking a process P | Q, we pass along the entire context to P, keeping 
track of all linear variables used, and remove those from the context given to 
Q. To do this we add an output to the algorithm: in an execution I, + P ; I>, 
output I> is the subset of I} containing only those variables of the input which 


Session Coalgebras: A Coalgebraic View on Session Types 397 


P+0=F IY +F=I12,¢:T un(T) Ii F= x g dom(I2) 
` I, + (F, x) =I» D =+ (F, x) = I 
Figure 14. Context Difference 
DHP; D D= 
r-o; r : A-InactT][A-R 
Ty FIP; T a SHEE 
LEP; I> IgtQ;T3 Di,x:T,y:T E P; I 
A-PAR][A- 
TF PQ; T; Aree ae ea O A 
ce(T)=(?,f) f(d)CU M,y:U, 2a: f(x) P; I [A-IN 
Ty,c:Tra(y:U).P; [p+ {x,y} 
= (! C : ; 
(T= (Lf) VERA) Meise) rh Pil moe 
I1,c:T,y: UF &y).P; [2+ {ax} 
[A-BRANCH 


c(T) = (&, Li, f) Lı C Lo Qe: fMFas T In =I, + {x} VIE Le 
D,x:T Hgo {l: Pher; I 


e(T) =(8,L,f) Tie: fOFPR;h leEL 
N,c:TRadl.B; [2+ {x} 


[A-SEL] 


c(T) = (un, f) par(T) Ti,x: f(*) F P;I>2 
Ds: TEH P; (T2+{0})£:T 


[A-UNPACK] 


Figure 15. Algorithmic Type Checking Rules 


had unrestricted types or were not used in P. We say subset because we want 
these variables, if present, to have the same type in Iù as in I}. 

Figure 15 lists the algorithmic versions of the type rules. A-PAR, for example, 
checks parallel processes as described. By construction, I> is one part of the 
context split required to instantiate T-PAR. The linear variables of the other 
part is exactly those which are present in J; but not in I>. This change in 
A-PAR requires adjusting the other rules. Firstly, we need the algorithm to 
accept even when we do not fully complete all sessions of I, in P. We do this by 
unconditionally accepting the terminated process. Note that acceptance of the 
algorithm now only implies well-formedness if the returned context is unrestricted. 

Secondly, the algorithm needs to remove linear variables from the output as 
we use them. We do not, however, want to remove any variable that has a linear 
type, as that would allow us to accept processes which do not complete all linear 
sessions. Thus, we introduce the context difference operator + in Fig. 14. [+ {a} 
is the context of all variable/type pairs in I’ minus a potential pair including z, 
but is only defined if (x, T) € I implies that T is unrestricted. 


398 A. C. Keizer et al. 


We elaborate on A-BRANCH; the algorithm is called once for every branch, 
yielding a context I) each time. Excluding x, each branch must use the exact 
same set of linear variables. Thus, we require that all these contexts are equal up 
to a potential (x, U) pair. Specifically, there is some I) such that I, = Il, + {x} 
for any l € Lo, this I> is the output context. 

To motivate this, consider a type T = &{a: Tun, b : end}, where Tun is some 
unrestricted type distinct from end, and some process P = «> {a : 0, b : O}. 
Let I’ be some unrestricted context, O is well-formed for both Ix : Tun and 
I’,x : end; the algorithm agrees. 


Le Tg F O's (Le Tyn) 
I,x:end 0; (1,2: end) 


The resulting contexts are not equal. P is well-formed for I’, so we have to allow x 
to have different types in the output of different branches in a complete algorithm. 
A-In, A-OutT, and A-SEL do not have multiple branches to check, but the ideas 
are similar. When introducing a new variable, either through a read or scope 
restriction, the new variable is also removed from the output. A-UNPACK only 
unpacks unrestricted types. We want those to have the same type in the input as 
in the output, so we remove the variable and add a pair with the original type. 
Take, for example, the process 


z: ?int, y:?int F 2(z1).0| y(z2).0 


The variables are split correctly, and both split contexts are unrestricted when 
the process is completed, thus it is well-formed. 

If, on the other hand, the left process did not complete the linear session, 
then the context difference would not have been defined. Take one such process: 


x: ?int.?int, y:?int KF  2(z,).0| y(z2).0 
We succeed in checking the terminated process of the left part. 
x: ?int, y:?int F 0; (a: ?int,y: ?int) 


But x has a linear type in the output. (a: ?int,y: ?int) + {x} is undefined, so 
the algorithm rejects this input entirely. The process was indeed not well-formed, 
and no further parallel processes could fix it; the rejection is expected. 

For each process and context there is at most one applicable algorithmic rule: 
which one is directed by the process syntax and unrestrictedness of a channel 
being interacted with. 

Under the same assumptions as before (i.e., the session coalgebra describing 
the types is finitely generated), this induced type checking algorithm is decidable, 
sound, and complete with respect to the type rules defined in Sec. 5. 


Theorem 3 (Decidability). The type checking algorithm terminates in finite 
time for every input, assuming a finitely generated session coalgebra. 
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Having defined algorithmic typechecking, we can go back to the language 
that we used to define our typing rules by erasing type annotations in input and 
restriction operators. Let erase(-) denote a function on processes defined as 


erase((vay : T).Q) = (vy).erase(Q) 
erase(a(y : T).Q) = x(y).erase(Q) 


and as an homomorphism on the remaining process constructs. We have: 


Theorem 4 (Correctness). For any context I and annotated process P, 
I, | erase(P) iff Dı + P; I> and un(Io). 


7 Concluding Remarks 


We have developed a new, language-independent foundation for session types 
by relying on coalgebras. We introduced session coalgebras, which elegantly cap- 
ture all communication structures of session types, both linear and unrestricted, 
without committing to a specific syntactic formulation for processes and types. 
Session coalgebras allow us to rediscover language-independent coinductive defin- 
itions for duality, subtyping, and type equivalence. A key idea is to assimilate 
channel types to the states of a session coalgebra; we demonstrated this insight by 
deriving a session type system for the a-calculus, which revisits and extends that 
by Vasconcelos [37], unlocking decidability results and algorithmic type checking. 

Interesting strands for future work include extending our coalgebraic toolbox 
so as to give a language-independent justification to advanced session type systems, 
such as context-free session types [35] and multiparty session types [21]. Another 
line concerns extending our coalgebraic view to include language-dependent is- 
sues and properties that require a global analysis on session behaviours. Salient 
examples are liveness properties such as (dead)lock-freedom and progress: ad- 
vanced type systems [23,29,28,8] typically couple (session) types with advanced 
mechanisms (such as priority-based annotations and strict partial orders), which 
provide a global insight to rule out the circular dependencies between sessions 
that are at the heart of stuck processes. Lastly, the whole area of coalgebra now 
becomes available to explore session types. One possible direction is to make use 
of final coalgebras and modal logic, which would allow us to analyse the behaviour 
of session coalgebras. This would be particularly powerful in combination with 
composition operations for session coalgebras to break down protocols and type 
checking. Another direction is to use session coalgebras to verify other coalgebras 
that take on the role of the syntactic a-calculus [12,27] and thereby allowing 
also for the exploration of other semantics like manifest sharing [1,2] without 
resorting to a specific syntax. 
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Abstract. Probabilistic programming is an approach to reasoning un- 
der uncertainty by encoding inference problems as programs. In order 
to solve these inference problems, probabilistic programming languages 
(PPLs) employ different inference algorithms, such as sequential Monte 
Carlo (SMC), Markov chain Monte Carlo (MCMC), or variational meth- 
ods. Existing research on such algorithms mainly concerns their imple- 
mentation and efficiency, rather than the correctness of the algorithms 
themselves when applied in the context of expressive PPLs. To rem- 
edy this, we give a correctness proof for SMC methods in the context 
of an expressive PPL calculus, representative of popular PPLs such as 
WebPPL, Anglican, and Birch. Previous work have studied correctness 
of MCMC using an operational semantics, and correctness of SMC and 
MCMC in a denotational setting without term recursion. However, for 
SMC inference—one of the most commonly used algorithms in PPLs as 
of today—no formal correctness proof exists in an operational setting. In 
particular, an open question is if the resample locations in a probabilistic 
program affects the correctness of SMC. We solve this fundamental prob- 
lem, and make four novel contributions: (i) we extend an untyped PPL 
lambda calculus and operational semantics to include explicit resample 
terms, expressing synchronization points in SMC inference; (ii) we prove, 
for the first time, that subject to mild restrictions, any placement of the 
explicit resample terms is valid for a generic form of SMC inference; (iii) 
as a result of (ii), our calculus benefits from classic results from the SMC 
literature: a law of large numbers and an unbiased estimate of the model 
evidence; and (iv) we formalize the bootstrap particle filter for the cal- 
culus and discuss how our results can be further extended to other SMC 
algorithms. 
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1 Introduction 


Probabilistic programming is a programming paradigm for probabilistic mod- 
els, encompassing a wide range of programming languages, libraries, and plat- 
forms [5,13,14,25,32,37,38]. Such probabilistic models are typically created to 
express inference problems, which are ubiquitous and highly significant in, for 
instance, machine learning [1], artificial intelligence [31], phylogenetics [29,30], 
and topic modeling [2]. 

In order to solve such inference problems, an inference algorithm is required. 
Common general-purpose algorithm choices for inference problems include se- 
quential Monte Carlo (SMC) methods [9], Markov chain Monte Carlo (MCMC) 
methods [12], and variational methods [42]. In traditional settings, correctness 
results for such algorithms often come in the form of laws of large numbers, 
central limit theorems, or optimality arguments. However, for general-purpose 
probabilistic programming languages (PPLs), the emphasis has predominantly 
been on algorithm implementations and their efficiency [14,25,37], rather than 
the correctness of the algorithms themselves. In particular, explicit connections 
between traditional theoretical SMC results and PPL semantics have been lim- 
ited. In this paper, we bridge this gap by formally connecting fundamental SMC 
results to the context of an expressive PPL calculus. 

Essentially, SMC works by simulating many executions of a probabilistic 
program concurrently, occasionally resampling the different executions. In this 
resampling step, SMC discards less likely executions, and replicates more likely 
executions, while remembering the average likelihood at each resampling step in 
order to estimate the overall likelihood. In expressive PPLs, there is freedom in 
choosing where in a program this resampling occurs. For example, most SMC 
implementations, such as WebPPL [14], Anglican [43], and Birch [25], always 
resample when all executions have reached a call to the weighting construct in 
the language. At possible resampling locations, Anglican takes a conservative 
approach by dynamically checking during runtime if all executions have either 
stopped at a weighting construct, or all have finished. If none of these two cases 
apply, report a runtime error. In contrast, WebPPL does not perform any checks 
and simply includes the executions that have finished in the resampling step. 
There are also heuristic approaches [21] that automatically align resampling lo- 
cations in programs, ensuring that all executions finish after encountering the 
same number of them. The motivations for using the above approaches are all 
based on experimental validation. As such, an open research problem is whether 
there are any inherent restrictions when selecting resampling locations, or if the 
correctness of SMC is independent of this selection. This is not only important 
theoretically to guarantee the correctness of inference results, but also for infer- 
ence performance, both since inference performance is affected by the locations 
of resampling locations [21] and since dynamic checks result in direct runtime 
overhead. We address this research problem in this paper. 

In the following, we give an overview of the paper and our contributions. In 
Section 2, we begin by giving a motivating example from phylogenetics, illus- 
trating the usefulness of our results. Next, in Section 3, we define the syntax and 
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operational semantics of an expressive functional PPL calculus based on the op- 
erational formalization in Borgström et al. [3], representative of common PPLs. 
The operational semantics assign to each pair of term t and initial random trace 
(sequences of random samples) a non-negative weight. This weight is accumu- 
lated during evaluation through a weight construct, which, in current calculi 
and implementations of SMC, is (implicitly) always followed by a resampling. 
To decouple resampling from weighting, we present our first contribution. 


(i) We extend the calculus from Borgström et al. [3] to include explicit resample 
terms, expressing explicit synchronization points for performing resampling 
in SMC. With this extension, we also define a semantics which limits the 
number of evaluated resample terms, laying the foundation for the remaining 
contributions. 


In Section 4, we define the probabilistic semantics of the calculus. The weight 
from the operational semantics is used to define unnormalized distributions {t} 
over traces and |t] over result terms. The measure [t] is called the target mea- 
sure, and finding a representation of this is the main objective of inference algo- 
rithms. 

We give a formal definition of SMC inference based on Chopin [6] in Section 5. 
This includes both a generic SMC algorithm, and two standard correctness re- 
sults from the SMC literature: a law of large numbers [6], and the unbiasedness 
of the likelihood estimate [26]. 

In Section 6, we proceed to present the main contributions. 


(ii) From the SMC formulation by Chopin [6], we formalize a sequence of dis- 
tributions ((t)),, indexed by n, such that ((t)),, allows for evaluating at most 
n resamples. This sequence is determined by the placement of resamples 
in t. Our first result is Theorem 1, showing that ((t)),, eventually equals 
(t) if the number of calls to resample is upper bounded. Because of the 
explicit resample construct, this also implies that, for all resample place- 
ments such that the number of calls to resample is upper bounded, ((t)), 
eventually equals {t}. We further relax the finite upper bound restriction 
and investigate under which conditions limp. ((t))n = Kt} pointwise. In 
particular, we relate this equality to the dominated convergence theorem in 
Theorem 2, which states that the limit converges as long as there exists a 
function dominating the weights encountered during evaluation. This gives 
an alternative set of conditions under which ((t)),, converges to ((t)) (now 
asymptotically, in the number of resamplings n). 


The contribution is fundamental, in that it provides us with a sequence of approx- 
imating distributions ((t)), of {t} that can be targeted by the SMC algorithm 
of Section 5. As a consequence, we can extend the standard correctness results 
of that section to our calculus. This is our next contribution. 


(iii) Given a suitable sequence of transition kernels (ways of moving between the 
((t))n), we can correctly approximate ((t)), with the SMC algorithm from 
Section 5. The approximation is correct in the sense of Section 5: the law of 


Correctness of Sequential Monte Carlo for Probabilistic Programming 407 


large numbers and the unbiasedness of the likelihood estimate holds. As a 
consequence of (ii), SMC also correctly approximates ((t)), and in turn the 
target measure |t]. Crucially, this also means estimating the model evidence 
(likelihood), which allows for compositionality [15] and comparisons between 
different models [30]. This contribution is summarized in Theorem 3. 


Related to the above contributions, Scibior et al. [33] formalizes SMC and 
MCMC inference as transformations over monadic inference representations us- 
ing a denotational approach (in contrast to our operational approach). They 
prove that their SMC transformations preserve the measure of the initial rep- 
resentation of the program (i.e., the target measure). Furthermore, their for- 
malization is based on a simply-typed lambda calculus with primitive recursion, 
while our formalization is based on an untyped lambda calculus which naturally 
supports full term recursion. Our approach is also rather more elementary, only 
requiring basic measure theory compared to the relatively heavy mathematics 
(category theory and synthetic measure theory) used by them. Regarding gen- 
eralizability, their approach is both general and compositional in the different 
inference transformations, while we abstract over parts of the SMC algorithm. 
This allows us, in particular, to relate directly to standard SMC correctness 
results. 

Section 7 concerns the instantiation of the transition kernels from (iii), and 
also discusses other SMC algorithms. Our last contribution is the following. 


(iv) We define a sequence of sub-probability kernels kt, induced by a given 
program t, corresponding to the fundamental SMC algorithm known as the 
bootstrap particle filter (BPF) for our calculus. This is the most common 
version of SMC, and we present a concrete SMC algorithm corresponding 
to these kernels. We also discuss other SMC algorithms and their relation 
to our formalization: the resample-move [11], alive [19], and auxiliary [28] 
particle filters. 


Importantly, by combining the above contributions, we justify that the imple- 
mentation strategies of the BPF's in WebPPL, Anglican, and Birch are indeed 
correct. In fact, our results show that the strategy in Anglican, in which every 
evaluation path must resample the same number of times, is too conservative. 

An extended version of this paper is also available [20]. This extended version 
includes rigorous definitions and detailed proofs for many lemmas found in the 
paper, as well as further examples and comments. The lemmas proved in the 
extended version are explicitly marked with t. 


2 A Motivating Example from Phylogenetics 


In this section, we give a motivating example from phylogenetics. The example 
is written in a functional PPL? developed as part of this paper, in order to verify 


3 The implementation is an interpreter written in OCaml. It largely follows the same 
approach as Anglican and WebPPL, and uses continuation-passing style in order to 
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ijlet tree = { 21/let simBranch startTime stopTime = 

2| left:{left:{age:0},right:{age:0},age:4}, 22| let curTime = startTime - 

3| right :{left:{age:0},right:{age:0},age:6}, 23 sample (exponential lambda) in 

4| age:10 24| if curTime < stopTime then () 

5|} in 25| else if not (crbdGoesExtinct curTime) 
6 26| then weight (log 0) // #1 

7|let lambda = 0.2 in let mu = 0.1 in 27| else (weight (log 2); // #2 

8 28 simBranch curTime stopTime) in 
9|let crbdGoesExtinct startTime = 29 

10| let curTime = startTime 30|let simTree tree parent = 

11 - (sample (exponential (lambda + mu))) 31| let w = -mu * (parent.age - tree.age) in 
12| in 32| weight w; // #3 

13| if curTime < 0 then false 33| simBranch parent.age tree.age; 

14| else 34| match tree with 

15| let speciation = sample 35| | {left,right,age} -> 

16 (bernoulli (lambda / (lambda + mu))) in 36 simTree left tree; simTree right tree 
17| if !speciation then true 37| | {age} -> © in 

18| else crbdGoesExtinct curTime 38 

19 && crbdGoesExtinct curTime in 39|simTree tree.left tree; 

20 40|simTree tree.right tree 


Fig.1: A simplified version of a phylogenetic birth-death model from [30]. See 
the text for a description. 


and experiment with the presented concepts and results. In particular, this PPL 
supports SMC inference (Algorithm 2) with decoupled resamples and weights’, 
as well as sampling from random distributions with a sample construct. 

Consider the program in Fig. 1, encoding a simplified version of a phylo- 
genetic birth-death model (see Ronquist et al. [30] for the full version). The 
problem is to find the model evidence for a particular birth rate (lambda = 
0.2) and death rate (mu = 0.1), given an observed phylogenetic tree. The tree 
represents known lineages of evolution, where the leaves are extant (surviving 
to the present) species. Most importantly, for illustrating the usefulness of the 
results in this paper, the recursive function simBranch, with its two weight ap- 
plications #1 and #2, is called a random number of times for each branch in the 
observed tree. Thus, different SMC executions encounter differing numbers of 
calls to weight. When resampling is performed after every call to weight (#1, #2, 
and #3), it is, because of the differing numbers of resamples, not obvious that 
inference is correct (e.g., the equivalent program in Anglican gives a runtime 
error). Our results show that such a resampling strategy is indeed correct. 

This strategy is far from optimal, however. For instance, only resampling at 
#3, which is encountered the same number of times in each execution, performs 
much better [21,30]. Our results show that this is correct as well, and that it gives 
the same asymptotic results as the naive strategy in the previous paragraph. 

Another strategy is to resample only at #1 and #3, again causing executions 
to encounter differing numbers of resamples. Because #1 weights with (log) 0, this 


pause and resume executions as part of inference. It is available at https://github. 
com/miking-lang/miking-dppl/tree/pplcore. The example in Fig. 1 can be found 
under examples/crbd/crbd-esop. ppl 

4 The implementation uses log weights as arguments to weight for numerical reasons. 
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approach gives the same accuracy as resampling only at #3, but avoids useless 
computation since a zero-weight execution can never obtain non-zero weight. 
Equivalently to resampling at #1, zero-weight executions can also be identified 
and stopped automatically at runtime. This gives a direct performance gain, 
and both are correct by our results. We compared the three strategies above for 
SMC inference with 50000 particles: resampling at #1,#2, and #3 resulted in a 
runtime of 15.0 seconds, at #3 in a runtime of 12.6 seconds, and at #1 and #3 in 
a runtime of 11.2 seconds. Furthermore, resampling at #1,#2, and #3 resulted in 
significantly worse accuracy compared to the other two strategies [21,30]. 

Summarizing the above, the results in this paper ensure correctness when 
exploring different resampling placement strategies. As just demonstrated, this 
is useful, because resampling strategies can have a large impact on SMC accuracy 
and performance. 


3 A Calculus for Probabilistic Programming Languages 


In this section, we define the calculus used throughout the paper. In Section 3.1, 
we begin by defining the syntax, and demonstrate how a simple probability dis- 
tribution can be encoded using it. In Section 3.2, we define the semantics and 
demonstrate it on the previously encoded probability distribution. This seman- 
tics is used in Section 4 to define the target measure for any given program. In 
Section 3.3, we extend the semantics of Section 3.2 to limit the number of al- 
lowed resamples in an evaluation. This extended semantics forms the foundation 
for formalizing SMC in Sections 6 and 7. 


3.1 Syntax 


The main difference between the calculus presented in this section and the stan- 
dard untyped lambda calculus is the addition of real numbers, functions oper- 
ating on real numbers, a sampling construct for drawing random values from 
real-valued probability distributions, and a construct for weighting executions. 
The rationale for making these additions is that, in addition to discrete prob- 
ability distributions, continuous distributions are ubiquitous in most real-world 
models, and the weighting construct is essential for encoding inference problems. 
In order to define the calculus, we let X be a countable set of variable names; 
D € D range over a countable set D of identifiers for families of probability 
distributions over R, where the family for each identifier D has a fixed number 
of real parameters |D|; and g € G range over a countable set G of identifiers for 
real-valued functions with respective arities |g|. More precisely, for each g, there 
is a measurable function a, : R'9| — R. For simplicity, we often use g to denote 
both the identifier and its measurable function. We can now give an inductive 
definition of the abstract syntax, consisting of values v and terms t. 


5 We repeated each experiment 20 times on a machine running Ubuntu 20.04 with an 
Intel i5-2500K CPU (4 cores) and 8GB memory. The standard deviation was under 
0.1 seconds in all three cases. 


410 D. Lundén et al. 


> 2 porns 
let p = samplepeta(2,2) in a 7 = 
let observe o = a a i 
Ò Pr 
sample Beta(2, 2) weight(fzern(p,0)) in oF i 0.5 - 
(a) iter observe [true, false, true]; p Bakes 
(b) (c) 


Fig. 2: The Beta(2,2) distribution as a program in (a), and visualized with a 
solid line in (c). Also, the program tobs in (b), visualized with a dashed line 
in (c). The iter function in (b) simply maps the given function over the given 
list and returns (). That is, it calls observe true, observe false, and observe true 
purely for the side-effect of weighting. 


Definition 1. 


tr=vfia|tt|iftthent else t | g(ti,..., tj) 


(1) 


vic | Art 
| | samplep(t1,...,t)p)) | weight(t) | resample 


Here, cE R,x EX, DED, g €G. We denote the set of all terms by T and the 
set of all values by V. 


The formal semantics is given in Section 3.2. Here, we instead give an informal 
description of the various language constructs. 

Some examples of distribution identifiers are M € D, the identifier for the 
family of normal distributions, and U/ € D, the identifier for the family of con- 
tinuous uniform distributions. The semantics of the term sample,;(0, 1) is, in- 
formally, “draw a random sample from the normal distribution with mean 0 and 
variance 1”. The weight construct is illustrated later in this section, and we 
discuss the resample construct in detail in Sections 3.3 and 6. 

We use common syntactic sugar throughout the paper. Most importantly, we 
use false and true as aliases for 0 and 1, respectively, and () (unit) as another alias 
for 0. Furthermore, we often write g € G as infix operators. For instance, 1+ 2 is 
a valid term, where + € G. Now, let R+} denote the non-negative reals. We define 
fo : RPH — R, as the function fp € G such that fp(c1,..-,Cpj +) is the 
probability density (continuous distribution) or mass function (discrete distribu- 
tion) for the probability distribution corresponding to D € D and (ci,...,qp))- 
For example, fw (0,1,2) = Tz -e-2'® is the standard probability density of 
the normal distribution with mean 0 and variance 1. Lastly, we will also use let 
bindings, let rec bindings, sequencing using ;, and lists (all of which can be 
encoded in the calculus). Sequencing is required for the side-effects produced by 
weight (see Definition 5) and resample (see Sections 3.3 and 6). 

We now consider an example. In Sections 3.2 and 4.3 this example will be 
further considered to illustrate the semantics, and target measure, respectively. 
Here, we first give the syntax, and informally visualize the probability distribu- 


tions (i.e., the target measures, as we will see in Section 4.3) for the example. 
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Consider first the program in Fig. 2a, directly encoding the Beta(2,2) distri- 
bution, illustrated in Fig. 2c. This distribution naturally represents the uncer- 
tainty in the bias of a coin—in this case, the coin is most likely unbiased (bias 
0.5), and biases closer to 0 and 1 are less likely. In Fig. 2b, we extend Fig. 2a 
by observing the sequence [true, false, true] when flipping the coin. These ob- 
servations are encoded using the weight construct, which simply accumulates 
a product (as a side-effect) of all real-valued arguments given to it through- 
out the execution. First, recall the standard mass function (cf,,,,(p, true) = 
P; Tfpem(P, false) = (1 — p); Cfpenl( P, £) = 0 otherwise) for the Bernoulli dis- 
tribution corresponding to fgern E€ G. The observations [true, false, true] are 
encoded using the observe function, which uses the weight construct internally 
to assign weights to the current value p according to the Bernoulli mass function. 
As an example, assume we have drawn p = 0.4. The weight for this execution 
İS O from (0-4, true) - 0 foan (0-4, false) - O foen (0-4, true) = 0.4? - 0.6. Now consider 
p = 0.6 instead. For this value of p the weight is instead 0.6? - 0.4. This explains 
the shift in Fig. 2c—a bias closer to 1 is more likely, since we have observed two 
true flips, but only one false. 


3.2 Semantics 


In this section, we define the semantics of our calculus. The definition is split 
into two parts: a deterministic semantics and a stochastic semantics. We use 
evaluation contexts to assist in defining our semantics. The evaluation contexts 
E induce a call-by-value semantics, and are defined as follows. 


Definition 2. 
E ::= |] | Et | (Avt) E | if E then t else t 


| g(c1;- -<s Cm, E, tm+2;- - -s tjgl) (2) 
| samplep(c1,-..,Cm,E,tm42,--.,t)p)) | weight (E) 


We denote the set of all evaluation contexts by E. 


With the evaluation contexts in place, we proceed to define the deterministic 
semantics through a small-step relation >per. 


Definition 3. 


c= Og(c1, ee stig) 
B[(\z.t) v| Sper El[z > vjt] (APP) Benea ial nr Bla (PRIM) 
(Ir TRUE) (3) 


E/if true then tı else tg] Sper Elt] 


IFFALSE 
Elif false then tı else to] per Eb) | ) 


The rules are straightforward, and will not be discussed in further detail here. 
We use the standard notation for transitive and reflexive closures (e.g. —>Dpr), 
and transitive closures (e.g. —>$,,) of relations throughout the paper. 
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Following the tradition of Kozen [18] and Park et al. [27], sampling in our 
stochastic semantics works by consuming randomness from a tape of real num- 
bers. We use inverse transform sampling, and therefore the tape consists of 
numbers from the interval [0, 1]. In order to use inverse transform sampling, we 
require that for each D € D, there exists a measurable function Fp 1. RIPI x 
[0,1] — R, such that Fa (c, ...5C|p|,*) is the inverse cumulative distribution 
function for the probability distribution corresponding to D and (c1,...,¢jp))- 
We call the tape of real numbers a trace, and make the following definition. 


Definition 4. Let No = NU {0}. The set of all traces is S = pen, l0, 11”. 


We use the notation (c1,c2,...,¢n)s to indicate the trace consisting of the n 
numbers c1, C2, ..., Cn. Given a trace s, we denote by |s| the length of the trace. 
We also denote the concatenation of two traces s and s’ with s x s’. Lastly, we 
let c :: s denote the extension of the trace s with the real number c as head. 

With the traces and F5 1 defined, we can proceed to the stochastic® semantics 
— over Tx R4 x S. 


Definition 5. 
tstop “= V | Elsamplep(ci,...,cp|)] | Elweight(c)] | Elresample] (4) 


+ 
t Der tstop (DET) C > 0 
t, w,S — tstop, W, S E/weight(c)],w,s > E[()],w-c,s 


(WEIGHT) 


c= F5"(e1,-++)¢D)P) 
E[samplep(c1,.--,c¢jp})],w,p =: s > Efe], w, s 


(SAMPLE) (5) 


RESAM 
E[resample], w, s > E[()], w, a Pa 


The rule (DET) encapsulates the —>prsr relation, and states that terms can move 
deterministically only to terms of the form tstop. Note that terms of the form 
tstop are found at the left-hand side in the other rules. The (SAMPLE) rule de- 
scribes how random values are drawn from the inverse cumulative distribution 
functions and the trace when terms of the form samplep(ci,...,cjp)) are en- 
countered. Similarly, the WEIGHT rule determines how the weight is updated 
when weight(c) terms are encountered. Finally, the resample construct always 
evaluates to unit, and is therefore meaningless from the perspective of this se- 
mantics. We elaborate on the role of the resample construct in Section 3.3. 

With the semantics in place, we define two important functions over S for a 
given term. In the below definition, assume that a fixed term t is given. 


Definition 6. 


() otherwise 0 otherwise 


rt(s) — T if t, l,s >* V,U, Os fels) E fs if t, 1,s =" Vv, Ww, Os (6) 


6 Note that the semantics models stochastic behavior, but is itself a deterministic 
relation. 
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Intuitively, r4 is the function returning the result value after having repeatedly 
applied — on the initial trace s. Analogously, fe gives the density or weight 
of a particular s. Note that, if (t,1,s) gets stuck or diverges, the result value 
is (), and the weight is 0. In other words, we disregard such traces entirely, 
since we are in practice only interested in probability distributions over values. 
Furthermore, note that if the final s # ()s, the value and weight are again () 
and 0, respectively. The motivation for this is discussed in Section 4.3. 

To illustrate rt, ft, and the weight construct, consider the program tobs in 
Fig. 2b, and the singleton trace (0.8)s. This program will, in total, evaluate one 
call to sample, and three calls to weight. Now, let h(c) = F'3,,,(2,2,¢) and recall 
the function of,.,,, from Section 3.1. Using the notation ¢(c, £) = of,.,,(h(c), £), 
we have, for some evaluation contexts E4, E2, E3, E4, 


tos, 1, (0.8)s = E1 [sample p,,,,(2, 2)], 1, (0.8)s + E1 [A(0.8)], 1, Os 

> E,|weight(d(0.8, true))],1, )s > E2[0)], (0.8, true), Os 

= Ep[()],2(0.8), (s >+ Es[()], (0.8, false) - h(0.8), ()s (7) 
> E4[()], 6(0.8, true) - (1 — h(0.8)) - (0.8), Os 

+ h(0.8), h(0.8) - (1 — h(0.8)) - h(0.8), Os- 


That is, rt,,,((0.8)s) = h(0.8) and ft,,.((0.8)s) = A(0. 
trary c, we see that r¢,,,((c)s) = A(c) and fta ((c)s) 
other trace s with |s| Æ 1, r¢,,,(s) = ( and ft,,,(s) = 
when reconsidering this example in Section 4.3. 


8)?(1 — h(0.8)). For arbi- 
= h(c)*(1 — h(c)). For any 
0. We will apply this result 


3.3 Resampling Semantics 


In order to connect SMC in PPLs to the classical formalization of SMC presented 
in Section 5—and thus enabling the theoretical treatments in Sections 6 and 7— 
we need a relation in which terms “stop” after a certain number n of encountered 
resample terms. In this section, we define such a relation, denoted by ©. Its 
definition is given below. 


Definition 7. 
t#Elresample] t,w,s— t’,w’,s’ 


STOCH-FIN 
t, w,s,n > t’,w’,s’,n ( ) 


(8) 


n>0 E[resample],w,s > E[()],w,s 
E[resample], w,s,n > E[()], w,s,n — 1 


(RESAMPLE-FIN) 


This relation is + extended with a natural number n, indicating how many 
further resample terms can be evaluated. We implement this limitation by re- 
placing the rule (RESAMPLE) of —> with (RESAMPLE-FIN) of —> above which 
decrements n each time it is applied, causing terms to get stuck at the n + 1th 
resample encountered. 

Now, assume that a fixed term t is given. We define rtn and ft,n similar to 
Tt and fe 
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v if t,1,5,n =>* v, w, ()s, n’ 
Definition 8. rtn(s) = § E[resample] ift,1,s,n —* E[resamplel, w, ()s,0 
() otherwise 


w ift,l,s,n =>* v,u, ()s,n' 
Definition 9. ften(s)= <w ift,1,s,n —* E|resample], w, ()s,0 


0 otherwise 


As for r4 and fe, these functions return the result value and weight, respectively, 
after having repeatedly applied —> on the initial trace s. There is one difference 
compared to —: besides values, we now also allow stopping with non-zero weight 
at terms of the form E/resamp1le]. 

To illustrate >, rtn(s), and ft,n(s), consider the term tseq defined by 


let observe x o = weight(fy(a#,4,0)); resample in 
let sim £n—-ı On = 


let £n = sampley(@,-1+2,1) in observe £n On; Un in (9) 
let wo = sampleų (0,100) in 
let f = foldl sim in f xo [c1,C2,...,Ct—1, Ct]. 


This term encodes a model in which an object moves along a real-valued axis 
in discrete time steps, but where the actual positions (x1, £2, ...) can only be 
observed through a noisy sensor (c1, C2, ...). The inference problem consists 
of finding the probability distribution for the very last position, x+, given all 
collected observations (c1, C2, ..., C4). Most importantly, note the position of 
resample in (9)—it is evaluated just after evaluating weight in every folding 
step. Because of this, for n < t and all traces s such that ft... .n(s) > 0, we have 
Ttaeqn(8) = E.,|resample; £n], where Eseg = f [] [en41,cn+2,---,cr-1,c4] and 
where x, is the value sampled in sim at the nth folding step. That is, we can 
now “stop” evaluation at resamples. We will revisit this example in Section 6. 


4 The Target Measure of a Program 


In this section, we define the target measure induced by any given program in our 
calculus. We assume basic familiarity with measure theory, Lebesgue integration, 
and Borel spaces. McDonald and Weiss [23] provide a pedagogical introduction 
to the subject. In order to define the target measure of a program as a Lebesgue 
integral (Section 4.3), we require a measure space on traces (Section 4.1), and 
a measurable space on terms (Section 4.2). For illustration, we derive the target 
measure for the example program from Section 3 in Section 4.3. The concepts 
presented in this section are quite standard, and experienced readers might want 
to quickly skim it, or even skip it entirely. 
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4.1 A Measure Space over Traces 


We use a standard measure space over traces of samples [22]. First, we define 
a measurable space over traces. We denote the Borel o-algebra on R” with B”, 
and the Borel o-algebra on [0,1] with Bi 4). 


Definition 10. The o-algebra S on S is the a-algebra consisting of sets of the 
form S = Unen, Bn with Bn € (0,1): Naturally, [0,1]° is the singleton set 
containing the empty trace. In other words, ([0,1]°, Bio) = {Ost {{0s} OF), 
where ()s denotes the empty trace. 


Lemma 1. (S,S) is a measurable space.t 


The most common measure on B” is the n-dimensional Lebesgue measure, de- 
noted An. For n = 0, we let Ao = ÔQ, where 6 denotes the standard Dirac 
measure. By combining the Lebesgue measures for each n, we construct a mea- 
sure jig over (S, S). 


Definition 11. ps(S) = us (Unene Bn) = aie An(Bn) 
Lemma 2. (S,S, ps) is a measure space. Furthermore, ug is o-finite.t 


A comment on notation: we denote universal sets by blackboard bold capital 
letters (e.g., S), o-algebras by calligraphic capital letters (e.g., S), members of 
o-algebras by capital letters (e.g., S), and individual elements by lower case 
letters (e.g., s). 


4.2 A Measurable Space over Terms 


In order to show that r is measurable, we need a measurable space over terms. 
We let (T, T) denote the measurable space that we seek to construct, and follow 
the approach in Staton et al. [35] and Vakar et al. [39]. Because our calculus in- 
cludes the reals, we would like to at least have B C T. Furthermore, we would also 
like to extend the Borel measurable sets 6B” to terms with n reals as subterms. 
For instance, we want sets of the form {(Az. (Ay. £ +y) c2) cı | (C1, ¢2) € Bo} 
to be measurable, where Bz € B?. This leads us to consider terms in a language 
in which constants (i.e., reals) are replaced with placeholders [-]. 


Definition 12. Let vp == |] | Av.t replace the values v from Definition 1. The 
set of all terms in the resulting new calculus is denoted with Tp. 


Most importantly, it is easy to verify that T, is countable. Next, we make the 
following definitions. 


Definition 13. For n € No, we denote by Tp C Tp the set of all terms with 
exactly n placeholders. 


Definition 14. We lett, range over the elements of T. The tẹ can be regarded 
as functions tp : R” > tp (Rn) which replaces the n placeholders with the n reals 
given as arguments. 
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Definition 15. Ten = {t} (Bn) | Bn € B"}. 
From the above definitions, we construct the required o-algebra T. 


Definition 16. The o-algebra T on T is the o-algebra consisting of sets of the 
form T= ene Use ers th (Bn). 


Lemma 3. (T,7) is a measurable space.t 


4.3 The Target Measure 


We are now in a position to define the target measure. We will first give the 
formal definitions, and then illustrate the definitions with an example. The def- 
initions rely on the following result. 


Lemma 4. r : (S,S) > (T,T) and fi : (S, S) > (R+, B+) are measurable.t 


We can now proceed to define the measure ((t)) over S induced by a term t using 
Lebesgue integration. 


Definition 17. ((t))(S) = J, fels) dus(s) 


Using Definition 17 and the measurability of r4, we can also define a corre- 
sponding pushforward measure [t] over T. 


Definition 18. [t](T) = Kt) (rg (T)) = SX) fels) dus(s). 


The measure |t] is our target measure, i.e., the measure encoded by our program 
that we are interested in. 

Let us now consider the target measure for the program given by tops. It is 
not too difficult to show that [toss](T) = frag (1 — ©)? dX(c). We recognize 
the integrand as the density for the Beta(4,3) distribution, which, as expected, 
is exactly the graph shown in Fig. 2c. 

We should in some way ensure the target measure is finite (i-e., can be normal- 
ized to a probability measure), since we are in the end most often only interested 
in probability measures. Unfortunately, as observed by Staton [34], there is no 
known useful syntactic restriction that enforces finite measures in PPLs while 
still admitting weights > 1. We will discuss this further in Section 6.2 in relation 
to SMC in our calculus. 

Lastly, from Section 3.2, recall that we disallow non-empty final traces in 
fe and rz. We see here why this is needed: if allowed, for every trace s with 
f(s) > 0, all extensions sx s’ have the same density f,(s*s’) = fels) > 0. From 
this, it is easy to check that if [t] 4 0 (the zero measure), then [t](T) = 00 (ie., 
the measure is not finite). In fact, for any T € 7, [t](T) > 0 ==> [t](T) =~. 
Clearly, this is not a useful target measure. 
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5 Formal SMC 


In this section, we give a generic formalization of SMC based on Chopin [6]. We 
assume a basic understanding of SMC. For a complete introduction to SMC, we 
recommend Naesseth et al. [26] and Doucet and Johansen [10]. 

First, in Section 5.1, we introduce transition kernels, which is a fundamental 
concept used in the remaining sections of the paper. Second, in Section 5.2, we 
describe Chopin’s generic formalization of SMC as an algorithm for approximat- 
ing a sequence of distributions based on a sequence of approximating transition 
kernels. Lastly, in Section 5.3, we give standard correctness results for the algo- 
rithm. 


5.1 Preliminaries: Transition Kernels 


Intuitively, transition kernels describe how elements move between measurable 
spaces. For a more comprehensive introduction, see Vákár and Ong [40]. 


Definition 19. Let (A, A) and (A’, A’) be measurable spaces, and let BY. = {B | 
B\ {co} € B4}. A function k : A x A’ > Rï is a (transition) kernel if (1) for 
alla € A, k(a,-) : A’ — Rï is a measure on A’, and (2) for all A’ € A’, 
k(-, A’) : (A, A) + (R4, BY.) is measurable. 


Additionally, we can classify transition kernels according to the below definition. 


Definition 20. Let (A, A) and (A’, A’) be measurable spaces. A kernel k : A x 
A’ > Rë is a sub-probability kernel if k(a,-) is a sub-probability measure for 
all a € A; a probability kernel if k(a,-) is a probability measure for all a € A; 
and a finite kernel if supe, k(a, A’) < oo. 


5.2 Algorithm 


The starting point in Chopin’s formulation of SMC is a sequence of probability 
measures mn (over respective measurable spaces (An, An), with n € No) that are 
difficult or impossible to directly draw samples from. 

The SMC approach is to generate samples from the mn by first sampling 
from a sequence of proposal measures qn, and then correcting for the discrep- 
ancy between these measures by weighting the proposal samples. The proposal 
distributions are generated from an initial measure go and a sequence of transi- 
tion kernels kn : An—1 X An > [0,1],n € N as 


n(An) a f kn (an-1, An) dTn-—1(an-1). (10) 


In order to approximate 7, by weighting samples from qn, we need some way 
of obtaining the appropriate weights. Hence, we require each measurable space 
(An, An) to have a default o-finite measure j1a,,, and the measures mp and qn to 
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Algorithm 1 A generic formulation of sequential Monte Carlo inference based 
on Chopin [6]. In each step, we let 1 < j < J, where J is the number of samples. 


1. Initialization: Set n = 0. Draw af ~ qo forl<j< J. 
The empirical distribution given by {al} approximates qo. 
2. Correction: Calculate wi, = ae 
Gn (Oh 
The empirical distribution given by {(a%,, wl) approximates Tn. 
3. Selection: Resample the empirical distribution {(a%,, wl)}j=1- 
The new empirical distribution is unweighted and is given by {a2} 74. This dis- 
tribution also approximates Tn. 
4. Mutation: Increment n. 
Draw af, ~ kn (âl, -) for 1 < j < J. The empirical distribution given by {a} ya 
approximates qn. Go to (2). 


have densities f;,, and fy, with respect to this default measure. Furthermore, 
we require that the functions f,;,, and fg, can be efficiently computed pointwise, 
up to an unknown constant factor per function and value of n. More precisely, 
we can efficiently compute the densities fz, = Zz, - fr, and fz, = Zn © fan: 
corresponding to the unnormalized measures Tp, = Zx, ° Tn and Gn = Zz, ` qn- 
Here, Zz, = Tn(An) € Ry and Z%, = gn(An) € R+ denote the unknown 
normalizing constants for the distributions 7, and gn. 

Algorithm 1 presents a generic version of SMC [6] for approximating mn. We 
make the notion of approximation used in the algorithm precise in Section 5.3. 
Note that in the correction step, the unnormalized pointwise evaluation of fr, 
and f,, is used to calculate the weights. In the algorithm description, we also use 
some new terminology. First, an empirical distribution is the discrete probability 
measure formed by a finite set of possibly weighted samples {(a,, w/,)}/_1, where 
aj, € An and wi € R,. Second, when resampling an empirical distribution, 
we sample J times from it (with replacement), with each sample having its 
normalized weight as probability of being sampled. More specifically, this is 
known as multinomial resampling. Other resampling schemes also exist [8], and 
are often used in practice to reduce variance. After resampling, the set of samples 
forms a new empirical distribution with J unweighted (all wł = 1) samples. 

An important feature of SMC compared to other inference algorithms is 
that SMC produces, as a by-product of inference, unbiased estimates Zz, of 
the normalizing constants Z;,. Stated differently, this means that Algorithm 1 
not only approximates the mn, but also the unnormalized versions ŭn. From the 
weights wł in Algorithm 1, the estimates are given by 


n J 
2z,=T] D Dei N (11) 
i=0" j= 


for each 7. We give the unbiasedness result of Zz, in Lemma 5 (item 2) below. 
The normalizing constant is often used to compare the accuracy of different 
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probabilistic models, and as such, it is also known as the marginal likelihood, or 
model evidence. For an example application, see Ronquist et al. [30]. 

To conclude this section, note that many sequences of probability kernels 
kn can be used to approximate the same sequence of measures mn. The only 
requirement on the kpn is that fr,(an) > 0 => fan (an) > 0 must hold for all 
n € No and an € An (i.e., the proposals must “cover” the mn) [9]. We call such a 
sequence of kernels k,, valid. Different choices of kp induce different proposals qn, 
and hence capture different SMC algorithms. The most common example is the 
BPF, which directly uses the kernels from the model as the sequence of kernels 
in the SMC algorithm (hence the “bootstrap” ). In Section 7.1, we formalize the 
bootstrap kernels in the context of our calculus. However, we may want to choose 
other probability kernels that satisfy the covering condition, since the choice of 
kernels can have major implications for the rate of convergence [28]. 


5.3 Correctness 
We begin by defining the notion of approximation used in Algorithm 1. 


Definition 21 (Based on Chopin [6, p. 2387]). Let (A, A) denote a measur- 
able space, {{(a4-7, wt?) yet} sen a triangular array of random variables in A x 
R, and x: A —> RÄ a probability measure. We say that {{(a*”, wht) 7s} sen 
Dia wh! pfa?) 

J = = 

SS was 

surely for all measurable functions p : (A, A) > (R, B) such that E,(y)—the 
expected value of the function p over the distribution m—exists. 


in(y) holds almost 


approximates m if the equality jim 
J> 00 


First, note that the triangular array can also be viewed as a sequence of ran- 
dom empirical distributions (indexed by J). Precisely such sequences are formed 
by the random empirical distributions in Algorithm 1 when indexed by the in- 
creasing number of samples J. For simplicity, we often let context determine the 
sequence, and directly state that a random empirical distribution approximates 
some distribution (as in Algorithm 1). 

Two classical results in SMC literature are given in the following lemma: a 
law of large numbers and the unbiasedness of the normalizing constant estimate. 
We take these results as the definition of SMC correctness used in this paper. 


Lemma 5. Let nn, n € No, be a sequence of probability measures over measur- 
able spaces (An, An) with default o-finite measures ua„, such that the mn have 
densities fr, with respect to these default measures. Furthermore, let qo be a 
probability measure with density fọ with respect to Hao, and kn a sequence of 
probability kernels inducing a sequence of proposal probability measures qn, given 
by (10), over (An, An) with densities fq, with respect to pa„. Also, assume the 
kn are valid, i.e., that that fr„(an) > 0 = > fan (Gn) > 0 holds for all n € No 
and an € An. Then 


1. the empirical distributions {(ai,, w) a and {ah} produced by Algo- 
rithm 1 approximate T, for each n € No; and 
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2: (Êz, ) = Z%,, for each n € No, where the expectation is taken with respect 


to the weights produced when running Algorithm 1, and Zz, is given by (11). 


Proof. As referenced in Naesseth et al. [26], see Del Moral [7][Theorem 7.4.3] for 
1. For 2, see Naesseth et al. [26][Appendix 4.A]. 


Chopin [6][Theorem 1] gives another SMC convergence result in the form of a 
central limit. This result, however, requires further restrictions on the weights wł 
in Algorithm 1. It is not clear when these restrictions are fulfilled when applying 
SMC on a program in our calculus. This is an interesting topic for future work. 


6 Formal SMC for Probabilistic Programming Languages 


This section contains our main contribution: how to interpret the operational se- 
mantics of our calculus as the unnormalized sequence of measures p in Chopin’s 
formalization (Section 6.1), as well as sufficient conditions for this sequence of 
approximating measures to converge to ((t)) and for the normalizing constant 
estimate to be correct (Section 6.2). 

An important insight during this work was that it is more convenient to 
find an approximating sequence of measures ((t)),, to the trace measure ((t)), 
compared to finding a sequence of measures [t], directly approximating the 
target measure |t]. In Section 6.1, we define ((t)), similarly to {t}, except that 
at most n evaluations of resample are allowed. This upper bound on the number 
of resamples is formalized through the relation — from Section 3.3. 

In Section 6.2, we obtain two different conditions for the convergence of the 
sequence ((t)),, to (t): Theorem 1 states that for programs with an upper bound 
N on the number of resamples they evaluate, ((t)) y = (t). This precondition 
holds in many practical settings, for instance where each resampling is connected 
to a datum collected before inference starts. Theorem 2 states another conver- 
gence result for programs without such an upper bound but with dominated 
weights. Because of these convergence results, we can often approximate ((t)) by 
approximating ((t)),, with Algorithm 1. When this is the case, Lemma 5 implies 
that Algorithm 1, either after a sufficient number of time steps or asymptotically, 
correctly approximates ((t)) and the normalizing constant Z(). This is the con- 
tent of Theorem 3. We conclude Section 6.2 by discussing resample placements 
and their relation to Theorem 3, as well as practical implications of Theorem 3. 


6.1 The Sequence of Measures Generated by a Program 


We now apply the formalization from Section 4.3 again, but with ft,n and rt» 
(from Section 3.3) replacing fe and rz. Intuitively, this yields a sequence of 
measures [t],, indexed by n, which are similar to |t], but only allow for evaluating 
at most n resamples. To illustrate this idea, consider again the program tse, in 
(9). Here, [tseq]o is a distribution over terms of the form E},,[resample; zı], 


[tseq]1 a distribution over terms of the form E%,,[resample; 2], and so forth. 
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For n > t, [tseg]n = [tseq], because it is clear that t is an upper bound on the 
number of resamples evaluated in tseq. 

While the measures [t],, are useful for giving intuition, it is easier from a 
technical perspective to define and work with ((t)),, the sequence of measures 
over traces where at most n resamples are allowed. First, we need the following 
result, analogous to Lemma 4. 


Lemma 6. rin : (S, S) > (T, T) and fen : (S,S) > (R+, B4) are measurable." 
This allows us to define ((t)),, (cf. Definition 17). 
Definition 22. ((t))n(S) = fi, fin(s) dus(s) 


6.2 Correctness 


We begin with a convergence result for when the number of calls to resample 
in a program is upper bounded. 


Theorem 1. If there is N € N such that ftn = fe whenever n > N, then 
((t))n = (t)) for alin >N. 


This follows directly since ft,n not only converges to fe, but is also equal to fe 
for all n > N. However, even if the number of calls to resample in t is upper 
bounded, there is still one concern with using ((t)), as Tn in Algorithm 1: there is 
no guarantee that the measures ((t)), can be normalized to probability measures 
and have unique densities (i.e., that they are finite). This is a requirement for 
the correctness results in Lemma 5. Unfortunately, recall from Section 4.3 that 
there is no known useful syntactic restriction that enforces finiteness of the target 
measure. This is clearly true for the measures ((t)),, as well, and as such, we need 
to make the assumption that the ((t)), are finite—otherwise, it is not clear that 
Algorithm 1 produces the correct result, since the conditions in Lemma 5 are 
not fulfilled. Fortunately, this assumption is valid for most, if not all, models of 
practical interest. Nevertheless, investigating whether or not the restriction to 
probability measures in Lemma 5 can be lifted to some extent is an interesting 
topic for future work. 

Although of limited practical interest, programs with an unbounded number 
of calls to resample are of interest from a semantic perspective. If we have 
limn—+oo ((t))n = Kt} pointwise, then any SMC algorithm approximating the 
sequence ((t)),, also approximates ((t)), at least asymptotically in the number of 
steps n. First, consider the program tgeo-res given by 


let rec geometric _ = 
resample; if sampletern(0.6) then 1 + geometric () else 1 (12) 
in geometric (). 


Note that tgeo-res has no upper bound on the number of calls to resample, 
and therefore Theorem 1 is not applicable. It is easy, however, to check that 
limy—oo (tgeo-res))n = ((tgeo-res)) pointwise. So does limy-+c0 ((t))n = Kt} point- 
wise hold in general? The answer is no, as we demonstrate next. 
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For limpo ((t)) n = (t) to hold pointwise, it must hold that lim, ,. fin = 
fe pointwise ps-ae. Unfortunately, this does not hold for all programs. Consider 
the program toop defined by let rec loop _ = resample; loop () in loop (). 
Here, ft,,,, = 0 since the program diverges deterministically, but ft,,.,.n(Qs) = 1 
for all n. Because pig({()s}) 4 0, we do not have limy +06 feioyn = fton pointwise 
us-ae. 

Even if we have liMmp—oo ftn = fe pointwise ug-ae, we might not have 
limno ltn = ((t)) pointwise. Consider, for instance, the program tunit given 
by 


let s = sampleų(0,1) in 

let rec foo n = 13) 
if s < 1/n then resample; weight 2; foo (2-n) else weight 0 in 

foo 1 


We have ftu = 0 and ftum = 2” - Ljo,1/27] for n > 0. Also, limnoo ftunin = 
ftuan pointwise. However, limn—oo (tunit)) n(S) = 1 4 0 = (tunit))(S). This shows 
that the limit may fail to hold, even for programs that terminate almost surely, 
as is the case for the program tunit in (13). In fact, this program is positively 
almost surely terminating [4] since the expected number of recursive calls to foo 
is 1. 

Guided by the previous example, we now state the dominated convergence 
theorem—a fundamental result in measure theory—in the context of SMC in- 
ference in our calculus. 


Theorem 2. Assume that limn-.o ftn = fe holds pointwise us-ae. Further- 
more, assume that there exists a measurable function g : (S,S) > (R+, B+) such 
that fem < g us-ae for alln, and fẹ g(s)dus(s) < 00. Then limp +0 ((t))n = Kt) 
pointwise. 


For a proof, see McDonald and Weiss [23, Theorem 4.9]. It is easy to check that 
for our example in (13), there is no dominating and integrable g as is required 
in Theorem 2. We have already seen that the conclusion of the theorem fails 
to hold here. As a corollary, if there exists a dominating and integrable g, the 
measures ((t)),, are always finite. 


Corollary 1. If there exists a measurable function g : (S, S) > (R4, B4) such 
that fin < g us-ae for all n, and J, g(s)dus(s) < 00, then ((t))n is finite for 
each n € No. 


This holds because ((t))n(S) = fg fen(s)dus(s) < J g(s)dus(s) < œo. Hence, we 
do not need to assume the finiteness of ((t)), in order for Algorithm 1 to be 
applicable, as was the case for the setting of Theorem 1. 

In Theorem 3, we summarize and combine the above results with Lemma 5. 


Theorem 3. Lett be a term, and apply Algorithm 1 with ((t))n as ŭn, and with 
arbitrary valid kernels kn. If the condition of Theorem 1 holds and ((t)), is finite 
for each n € No, then Algorithm 1 approximates ((t)) and its normalizing con- 
stant after a finite number of steps. Alternatively, if the condition of Theorem 2 


Correctness of Sequential Monte Carlo for Probabilistic Programming 423 


holds, then Algorithm 1 approximates ((t)) and its normalizing constant in the 
limit n — oo. 


This follows directly from Theorem 1, Theorem 2, and Lemma 5. 

We conclude this section by discussing resample placements, and the prac- 
tical implications of Theorem 3. First, we define a resample placement for a 
term t as the term resulting from replacing arbitrary subterms t’ of t with 
resample; t’. Note that such a placement directly corresponds to constructing 
the sequence ((t)),,. Second, note that the measure ((t)) and the target measure 
[t] are clearly unaffected by such a placement—indeed, resample simply eval- 
uates to (), and for ((t)) and [t], there is no bound on how many resamples 
we can evaluate. As such, we conclude that all resample placements in t ful- 
filling one of the two conditions in Theorem 3 leads to a correct approximation 
of (t)) when applying Algorithm 1. Furthermore, there is always, in practice, 
an upper bound on the number of calls to resample, since any concrete run of 
SMC has an (explicit or implicit) upper bound on its runtime. This is a power- 
ful result, since it implies that when implementing SMC for PPLs, any method 
for selecting resampling locations in a program is correct under mild conditions 
(Theorem 1 or Theorem 2) that are most often, if not always, fulfilled in practice. 
Most importantly, this justifies the basic approach for placing resamples found 
in WebPPL, Anglican, and Birch, in which every call to weight is directly fol- 
lowed (implicitly) by a call to resample. It also justifies the approach to placing 
resamples described in Lundén et al. [21]. This latter approach is essential in, 
e.g., Ronquist et al. [30], in order to increase inference efficiency. 

Our results also show that the restriction in Anglican requiring all executions 
to encounter the same number of resamples, is too conservative. Clearly, this is 
not a requirement in either Theorem 1 or Theorem 2. For instance, the number 
of calls to resample varies significantly in (12). 


7 SMC Algorithms 


In this section, we take a look at how the kernels k„ in Algorithm 1 can be 
instantiated to yield the concrete SMC algorithm known as the bootstrap particle 
filter (Section 7.1), and also discuss other SMC algorithms and how they relate 
to Algorithm 1 (Section 7.2). 


7.1 The Bootstrap Particle Filter 


We define for each term t a particular sequence of kernels kg n, that gives rise 
to the SMC algorithm known as the bootstrap particle filter (BPF). Informally, 
these kernels correspond to simply continuing to evaluate the program until 
either arriving at a value v or a term of the form E[resamp1le]. For the bootstrap 
kernel, calculating the weights wÍ from Algorithm 1 is particularly simple. 
Similarly to ((t))n, it is more convenient to define and work with sequences 
of kernels over traces, rather than terms. We will define kt (s,-) to be the sub- 
probability measure over extended traces s x s’ resulting from evaluating the 
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term rt»—1(s) until the next resample or value v, ignoring any call to weight. 
First, we immediately have that the set of all traces that do not have s as prefix 
must have measure zero. To make this formal, we will use the inverse images of 
the functions prepend,(s’) = s * s’, s € S in the definition of the kernel. 


Lemma 7. The functions prepend, : (S, S) — (S, S) are measurable. 


The next ingredient for defining the kernels ktn is a function p,n that indicates 
what traces are possible when executing t until the n + 1th resample or value. 


1 if t,-,8,n —* v, s (s: 
Definition 23. ptn(s)=4 1 if t,-,s,n* E[resamplel, -, ()s, 0 
0 otherwise 


Note the similarities to Definition 9. In particular, ft,n(s) > 0 implies pt,n(s) = 1. 
However, note that f;,,(s) = 0 does not imply Pt,n(s) = 0, since Pin ignores 
weights. As an example, Ftweignt 0),n(()s) = 0, while P(weight 0),n(()s) =1. 


Lemma 8. pt»: (S, S) > (R4, B+) is measurable. 


The proof is analogous to that of Lemma 6. We can now formally define the 
kernels ktn. 


Definition 24. kt »(s, S) = Jprepends*(S) Pren—1(s),1(8’) dus(s’) 


By the definition of pt, the kt» are sub-probability kernels rather than proba- 
bility kernels. Intuitively, the reason for this is that during evaluation, terms can 
get stuck, deterministically diverge, or even stochastically diverge. Such traces 
are assigned 0 weight by Ptn- 


Lemma 9. The functions kt : S x S + Ry are sub-probability kernels.17 


We get a natural starting measure qo from the sub-probability distribution 
resulting from running the initial program t until reaching a value or a call to 
resample, ignoring weights. 


Definition 25. (t)o(S) = fg pt,0(s)dus(s). 


Now we have all the ingredients for the general SMC algorithm described 
in Section 5.2: a sequence of target measures ((t)),, = 7, (Definition 22), a 
starting measure (t)o « go (Definition 25), and a sequence of kernels kt.» « kn 
(Definition 24). These then induce a sequence of proposal measures (t)n = Gp as 
in Equation (10), which we instantiate in the following definition. 


Definition 26. (t),(S) = fs ktn(s,S) fem—1(s)dus(s), n> 0 

Intuitively, the measures (t),, are obtained by evaluating the terms in the 
support of the measure (t)),-1 until reaching the next resample or value. For 
an efficient implementation, we need to factorize this definition into the history 


and the current step, which amounts to splitting the traces. Each feasible trace 
can be split in such a way. 


T We only give a partial proof of this lemma. 
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Algorithm 2 A concrete instantiation of Algorithm 1 with Tn = ((t))n, kn « 
ktn, qo X (t)o, and as a consequence fn = (t), (for n > 0). In each step, we let 
1<j< J, where J is the number of samples. 


1. Initialization: Set n = 0. Draw sh ~ (t)o for 1 <j <J. 

That is, run the program t, and draw from U/(0, 1) whenever required by a sample p. 
Record these draws as the trace sj. Stop when reaching a term of the form 
E[resample] or a value v. The empirical distribution {s} }j—ı approximates (t)o. 

2. Correction: Calculate wł, = Flea hl forl<j<J. 

(t)n sh 
As a consequence of Lemma 13, this is trivial. Simply set wÍ to the weight ac- 
cumulated while running t in step (1), or rt;n—1(82_,) in step (5). The empirical 
distribution given by {(s/,, wi) }j_1 approximates ((t))n/Z(t)) 

3. Termination: If all samples r¢(s?,) are values, terminate and output 

{(s?,, w,)} 7-1. If not, go to the next step. 
We cannot evaluate values further, so running the algorithm further if all samples 
are values is pointless. When terminating, assuming the conditions in Theorem 1 or 
Theorem 2 holds, {(s,, w},)}/_1 approximates ((t))/Z(t),,. Also, by the definition 
of [t], {(re(s2,), w2,) }¥_1 approximates [t]/Zjej,,, the normalized version of [t]. 

4, Selection: Resample the empirical distribution {(s?,, wi). The new empirical 
distribution is unweighted and given by {33 Pi This distribution also approxi- 
mates ((t))n/Z(t)),- 

5. Mutation: Increment n. Draw sh ~ kt,n(8,_1,-) for 1 <j < J. 

That is, simply run the intermediate program rt,n—1(82,_,), and draw from U (0, 1) 
whenever required by a sample,,. Record these draws and append them to 8? _,, 
resulting in the trace s. Stop when reaching a term of the form E[resample] or a 
value v. The empirical distribution {s?,}/_, approximates (t)n/Z(t),,. Go to (2). 


Lemma 10. Let n> 0. If fi.n(s) > 0, then fi.n(s) = fe.n—1(8) fren —1(s),1(3) for 
exactly one decomposition s*5 = s. If ftn(s) =0, then fen— 1(8) fre n-a(s),1(3) = = 
0 for all decompositions s x5 = s. As a consequence, if ftn(s) > 0, then 
Pre n—1(s),1(3) = 1 

This gives a more efficiently yan definition of the density. 

Lemma 11. Forn €N, (t)n(S) = fg fen—1(8)Pren—1(s),1(5)dus(s), where s x 
5 = s is the unique oe from Lemma 10.8 


Since the kernels kt n are sub-probability kernels, the measures (t),, are finite 
given that the ((t)),, are finite. 


Lemma 12. (t)o is a sub-probability measure. Also, if ((t))n—1 is finite, then 
(t)n is finite. 


As discussed in Section 6.2, the ((t)), are finite, either by assumption (Theo- 
rem 1) or as a consequence of the dominating function of Theorem 2. From this 


8 We only give a proof sketch for this lemma. 
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and Lemma 12, the (t),, are also finite. Furthermore, checking that (t),, are 
valid, i.e. that the density fit), of each (t), covers the density fyty,, of ((t))n 
is trivial. As such, by Lemma 5, we can now correctly approximate ((t)), using 
Algorithm 1. The details are given in Algorithm 2, which closely resembles the 
standard SMC algorithm in WebPPL. For ease of notation, we assume it possible 
to draw samples from (t)o and kt. (s,-), even though these are sub-probability 
measures. This essentially corresponds to assuming evaluation never gets stuck 
or diverges. Making sure this is the case is not within the scope of this paper. 
The weights in Algorithm 2 at time step n can easily be calculated according to 
the following lemma. 


Feeya (5) Pe T T fayn (5) > 0. 


Lemma 13. w,(s) = o = feo(8) Faso 


Here, s x5 = s is the unique decomposition from Lemma 10.t 


7.2 Other SMC Algorithms 


In this section, we discuss SMC algorithms other than the BPF. 

First, we have the resample-move algorithm by Gilks and Berzuini [11], which 
is also implemented in WebPPL [13], and treated by Chopin [6] and Ścibior et 
al. [33]. In this algorithm, the SMC kernel is composed with a suitable MCMC 
kernel, such that one or more MCMC steps are taken for each sample after 
each resampling. This helps with the so-called degeneracy problem in SMC, 
which refers to the tendency of SMC samples to share a common ancestry as a 
result of resampling. We can directly achieve this algorithm in our context by 
simply choosing appropriate transition kernels in Algorithm 1. Let kycmcjn be 
MCMC transition kernels with 7,1 = ((t)),-1 as invariant distributions. Using 
the bootstrap kernels as the main kernels, we let kn = kt,n © kMCMC,n Where o 
denotes kernel composition. The sequence kpn is valid because of the validity of 
the main SMC kernels and the invariance of the MCMC kernels. 

While Algorithm 1 captures different SMC algorithms by allowing the use of 
different kernels, some algorithms require changes to Algorithm 1 itself. The first 
such variation of Algorithm 1 is the alive particle filter, recently discussed by 
Kudlicka et al. [19], which reduces the tendency to degeneracy by not including 
sample traces with zero weight in resampling. This is done by repeating the 
selection and mutation steps (for each sample individually) until a trace with 
non-zero weight is proposed; the corresponding modifications to Algorithm 1 are 
straightforward. The unbiasedness result of Kudlicka et al. [19] can easily be 
extended to our PPL context, with another minor modification to Algorithm 1. 

Another variation of Algorithm 1 is the auxiliary particle filter [28]. Infor- 
mally, this algorithm allows the selection and mutation steps of Algorithm 1 to 
be guided by future information regarding the weights wp. For many models, 
this is possible since the weighting functions wn from Algorithm 1 are often 
parametric in an explicitly available sequence of observation data points, which 
can also be used to derive better kernels kn. Clearly, such optimizations are 
model-specific, and can not directly be applied in expressive PPL calculi such as 
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ours. However, the general idea of using look-ahead in general-purpose PPLs to 
guide selection and mutation is interesting, and should be explored. 


8 Related Work 


The only major previous work related to formal SMC correctness in PPLs is 
Scibior et al. [33] (see Section 1). They validate both the BPF and the resample- 
move SMC algorithms in a denotational setting. In a companion paper, Scibior 
et al. [32] also give a Haskell implementation of these inference techniques. 

Although formal correctness proofs of SMC in PPLs are sparse, there are 
many languages that implement SMC algorithms. Goodman and Stuhlmiiller [14] 
describe SMC for the probabilistic programming language WebPPL. They im- 
plement a basic BPF very similar to Algorithm 2, but do not show correctness 
with respect to any language semantics. Also, related to WebPPL, Stuhlmiiller 
et al. [36] discuss a coarse-to-fine SMC inference technique for probabilistic pro- 
grams with independent sample statements. 

Wood et al. [43] describe PMCMC, an MCMC inference technique that uses 
SMC internally, for the probabilistic programming language Anglican [37]. Sim- 
ilarly to WebPPL, Anglican also includes a basic BPF similar to Algorithm 2, 
with the exception that every execution needs to encounter the same number of 
calls to resample. They use various types of empirical tests to validate correct- 
ness, in contrast to the formal proof found in this paper. Related to Anglican, 
a brief discussion on resample placement requirements can be found in van de 
Meent et al. [41]. 

Birch [25] is an imperative object-oriented PPL, with a particular focus 
on SMC. It supports a number of SMC algorithms, including the BPF [16] 
and the auxiliary particle filter [28]. Furthermore, they support dynamic an- 
alytical optimizations, for instance using locally-optimal proposals and Rao- 
Blackwellization [24]. As with WebPPL and Anglican, the focus is on perfor- 
mance and efficiency, and not on formal correctness. 

There are quite a few papers studying the correctness of MCMC algorithms 
for PPLs. Using the same underlying framework as for their SMC correctness 
proof, Scibior et al. [33] also validates a trace MCMC algorithm. Another proof 
of correctness for trace MCMC is given in Borgström et al. [3], which instead 
uses an untyped lambda calculus and an operational semantics. Much of the 
formalization in this paper is based on constructions used as part of their paper. 
For instance, the functions f, and r+ are defined similarly, as well as the measure 
space (S, S, us) and the measurable space (T,7). Our measurability proofs of 
ft, Tt, fen, and rt,» largely follow the same strategies as found in their paper. 
Similarly to us, they also relate their proof of correctness to classical results from 
the MCMC literature. A difference is that we use inverse transform sampling, 
whereas they use probability density functions. As a result of this, our traces 
consist of numbers on [0,1], while their traces consist of numbers on R. Also, 
inverse transform sampling naturally allows for built-in discrete distributions. 
In contrast, discrete distributions must be encoded in the language itself when 
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using probability densities. Another difference is that they restrict the arguments 
to weight to [0, 1], in order to ensure the finiteness of the target measure. 

Other work related to ours include Jacobs [17], Vakar et al. [39], and Staton et 
al. [35]. Jacobs [17] discusses problems with models in which observe (related 
to weight) statements occur conditionally. While our results show that SMC 
inference for such models is correct, the models themselves may not be useful. 
Vakar et al. [39] develops a powerful domain theory for term recursion in PPLs, 
but does not cover SMC inference in particular. Staton et al. [35] develops both 
operational and denotational semantics for a PPL calculus with higher-order 
functions, but without recursion. They also briefly mention SMC as a program 
transformation. 

Classical work on SMC includes Chopin [6], which we use as a basis for our 
formalization. In particular, Chopin [6] provides a general formulation of SMC, 
placing few requirements on the underlying model. The book by Del Moral [7] 
contains a vast number of classical SMC results, including the law of large num- 
bers and unbiasedness result from Lemma 5. A more accessible summary of the 
important SMC convergence results from Del Moral [7] can be found in Naesseth 
et al. [26]. 


9 Conclusions 


In conclusion, we have formalized SMC inference for an expressive functional 
PPL calculus, based on the formalization by Chopin [6]. We showed that in this 
context, SMC is correct in that it approximates the target measures encoded 
by programs in the calculus under mild conditions. Furthermore, we illustrated 
a particular instance of SMC for our calculus, the bootstrap particle filter, and 
discussed other variations of SMC and their relation to our calculus. 

As indicated in Section 2, the approach used for selecting resampling locations 
can have a large impact on SMC accuracy and performance. This leads us to 
the following general question: can we select optimal resampling locations in a 
given program, according to some formally defined measure of optimality? We 
leave this important research direction for future work. 
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Abstract. We study the differential properties of higher-order statisti- 
cal probabilistic programs with recursion and conditioning. Our starting 
point is an open problem posed by Hongseok Yang: what class of sta- 
tistical probabilistic programs have densities that are differentiable al- 
most everywhere? To formalise the problem, we consider Statistical PCF 
(SPCF), an extension of call-by-value PCF with real numbers, and con- 
structs for sampling and conditioning. We give SPCF a sampling-style 
operational semantics à la Borgström et al., and study the associated 
weight (commonly referred to as the density) function and value func- 
tion on the set of possible execution traces. 

Our main result is that almost surely terminating SPCF programs, gen- 
erated from a set of primitive functions (e.g. the set of analytic functions) 
satisfying mild closure properties, have weight and value functions that 
are almost everywhere differentiable. We use a stochastic form of sym- 
bolic execution to reason about almost everywhere differentiability. A 
by-product of this work is that almost surely terminating deterministic 
(S)PCF programs with real parameters denote functions that are almost 
everywhere differentiable. 

Our result is of practical interest, as almost everywhere differentiability 
of the density function is required to hold for the correctness of major 
gradient-based inference algorithms. 


1 Introduction 


Probabilistic programming refers to a set of tools and techniques for the system- 
atic use of programming languages in Bayesian statistical modelling. Users of 
probabilistic programming — those wishing to make inferences or predictions 
— (i) encode their domain knowledge in program form; (ii) condition certain 
program variables based on observed data; and (iii) make a query. The resulting 
code is then passed to an inference engine which performs the necessary com- 
putation to answer the query, usually following a generic approximate Bayesian 
inference algorithm. (In some recent systems [5,14], users may also write their 
own inference code.) The Programming Language community has contributed to 
the field by developing formal methods for probabilistic programming languages 
(PPLs), seen as usual languages enriched with primitives for (i) sampling and 
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(ii) conditioning. (The query (iii) can usually be encoded as the return value of 
the program.) 

It is crucial to have access to reasoning principles in this context. The com- 
bination of these new primitives with the traditional constructs of programming 
languages leads to a variety of new computational phenomena, and a major con- 
cern is the correctness of inference: given a query, will the algorithm converge, 
in some appropriate sense, to a correct answer? In a universal PPL (i.e. one 
whose underlying language is Turing-complete), this is not obvious: the infer- 
ence engine must account for a wide class of programs, going beyond the more 
well-behaved models found in many of the current statistical applications. Thus 
the design of inference algorithms, and the associated correctness proofs, are 
quite delicate. It is well-known, for instance, that in its original version the pop- 
ular lightweight Metropolis-Hastings algorithm [53] contained a bug affecting the 
result of inference [20,25]. 

Fortunately, research in this area benefits from decades of work on the seman- 
tics of programs with random features, starting with pioneering work by Kozen 
[26] and Saheb-Djahromi [44]. Both operational and denotational models have 
recently been applied to the validation of inference algorithms: see e.g. [20,8] 
for the former and [45,10] for the latter. There are other approaches, e.g. using 
refined type systems [33]. 

Inference algorithms in probabilistic programming are often based on the 
concept of program trace, because the operational behaviour of a program is 
parametrised by the sequence of random numbers it draws along the way. Ac- 
cordingly a probabilistic program has an associated value function which maps 
traces to output values. But the inference procedure relies on another function on 
traces, commonly called the density! of the program, which records a cumulative 
likelihood for the samples in a given trace. Approximating a normalised version 
of the density is the main challenge that inference algorithms aim to tackle. We 
will formalise these notions: in Sec. 3 we demonstrate how the value function 
and density of a program are defined in terms of its operational semantics. 


Contributions. The main result of this paper is that both the density and 
value function are differentiable almost everywhere (that is, everywhere but on 
a set of measure zero), provided the program is almost surely terminating in 
a suitable sense. Our result holds for a universal language with recursion and 
higher-order functions. We emphasise that it follows immediately that purely 
deterministic programs with real parameters denote functions that are almost 
everywhere differentiable. This class of programs is important, because they can 
express machine learning models which rely on gradient descent [30]. 

This result is of practical interest, because many modern inference algo- 
rithms are “gradient-based”: they exploit the derivative of the density function 
in order to optimise the approximation process. This includes the well-known 
methods of Hamiltonian Monte-Carlo [15,37] and stochastic variational infer- 
ence [18,40,6,27]. But these techniques can only be applied when the derivative 


1 For some readers this terminology may be ambiguous; see Remark 1 for clarification. 
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exists “often enough”, and thus, in the context of probabilistic programming, al- 
most everywhere differentiability is often cited as a requirement for correctness 
[55,31]. The question of which probabilistic programs satisfy this property was 
selected by Hongseok Yang in his FSCD 2019 invited lecture [54] as one of three 
open problems in the field of semantics for probabilistic programs. 

Points of non-differentiability exist largely because of branching, which typi- 
cally arises in a program when the control flow reaches a conditional statement. 
Hence our work is a study of the connections between the traces of a probabilis- 
tic program and its branching structure. To achieve this we introduce stochastic 
symbolic execution, a form of operational semantics for probabilistic programs, 
designed to identify sets of traces corresponding to the same control-flow branch. 
Roughly, a reduction sequence in this semantics corresponds to a control flow 
branch, and the rules additionally provide for every branch a symbolic expres- 
sion of the trace density, parametrised by the outcome of the random draws that 
the branch contains. We obtain our main result in conjunction with a careful 
analysis of the branching structure of almost surely terminating programs. 


Outline. We devote Sec. 2 to a more detailed introduction to the problem of 
trace-based inference in probabilistic programming, and the issue of differentia- 
bility in this context. In Sec. 3, we present a trace-based operational semantics 
to Statistical PCF, a prototypical higher-order functional language previously 
studied in the literature. This is followed by a discussion of differentiability and 
almost sure termination of programs (Sec. 4). In Sec. 5 we define the “symbolic” 
operational semantics required for the proof of our main result, which we present 
in Sec. 6. We discuss related work and further directions in Sec. 7. 
For the extended version of the paper refer to [34]. 


2 Probabilistic Programming and Trace-Based Inference 


In this section we give a short introduction to probabilistic programs and the 
densities they denote, and we motivate the need for gradient-based inference 
methods. Our account relies on classical notions from measure theory, so we 
start with a short recap. 


2.1 Measures and Densities 


A measurable space is a pair (X, Xx) consisting of a set together with a c- 
algebra of subsets, i.e. Xx C P(X) contains —) and is closed under complements 
and countable unions and intersections. Elements of Xx are called measurable 
sets. A measure on (X, Xx) is a function u : Xx —> [0, co] satisfying (Ø) = 0, 
and u(U;crUi) = Jier (Ui) for every countable family {Ui}icr of pairwise 
disjoint measurable subsets. A (possibly partial) function X — Y is measurable 
if for every U € Xy we have f~'(U) € Xy. 

The space R of real numbers is an important example. The (Borel) o-algebra 
XR is the smallest one containing all intervals [a, b), and the Lebesgue measure 
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Leb is the unique measure on (R, Xg) satisfying Leb({a, b)) = b — a. For measur- 
able spaces (X, Xx) and (Y, Xy), the product o-algebra Xxxy is the smallest 
one containing all U x V, where U € Xx and V € Xy. So in particular we get 
for each n € N a space (R", Yipn), and additionally there is a unique measure 
Leb,, on R” satisfying Leb, ([], Ui) = Į [; Leb(U). 

When a function f : X — R is measurable and non-negative and u is a 
measure on X, for each U € Xx we can define the integral fy(du)f € [0,00]. 
Common families of probability distributions on the reals (Uniform, Normal, 
etc.) are examples of measures on (R, Xp). Most often these are defined in terms 
of probability density functions with respect to the Lebesgue measure, meaning 
that for each up there is a measurable function pdfp : R — R>o which deter- 
mines it: up(U) = fy(dLeb) pdf. As we will see, density functions such as 
pdf, have a central place in Bayesian inference. 

Formally, if u is a measure on a measurable space X, a density for u with 
respect to another measure v on X (most often v is the Lebesgue measure) is a 
measurable function f : X — R such that u(U) = f,,(dv) f for every U € Xx. In 
the context of the present work, an inference algorithm can be understood as a 
method for approximating a distribution of which we only know the density up 
to a normalising constant. In other words, if the algorithm is fed a (measurable) 


function g : X — R, it should produce samples approximating the probability 


Su (dv)g 
measure U +> Tg on X. 


We will make use of some basic notions from topology: given a topological 
space X and an set A C X, the interior of A is the largest open set Å contained 
in A. Dually the closure of A is the smallest closed set A containing A, and the 
boundary of A is defined as 0A := A \ A. Note that for all U C R”, all of Ù, U 
and OU are measurable (in Xp»). 


2.2 Probabilistic Programming: a (Running) Example 


Our running example is based on a random walk in Rso. 

The story is as follows: a pedestrian has gone on a walk on a certain semi- 
infinite street (i.e. extending infinitely on one side), where she may periodically 
change directions. Upon reaching the end of the street she has forgotten her 
starting point, only remembering that she started no more than 3km away. 
Thanks to an odometer, she knows the total distance she has walked is 1.1km, 
although there is a small margin of error. Her starting point can be inferred 
using probabilistic programming, via the program in Fig. la. 

The function walk in Fig. la is a recursive simulation of the random walk: 
note that in this model a new direction is sampled after at most 1km. Once the 
pedestrian has travelled past 0 the function returns the total distance travelled. 
The rest of the program first specifies a prior distribution for the starting point, 
representing the pedestrian’s belief — uniform distribution on [0,3] — before 
observing the distance measured by the odometer. After drawing a value for 
start the program simulates a random walk, and the execution is weighted (via 
score) according to how close distance is to the observed value of 1.1. The return 
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(xreturns total distance travelled) 
let rec walk start = 
if (start <= 0) then 
0 
else 
(xeach leg < 1kmx) 
let step = Uniform(0, 1) in 
if (flip ()) then 
(xgo towards +infty*) 


step + walk (start+step) 
else 
(xgo towards 0x) 
step + walk (start—step) 104 
in 
(* prior) 5 6 
let start = Uniform(0, 3) in g 4 
let distance = walk start in 7 
(*likelihoodx) cm 2 
score ((pdfN distance 0.1) 1.1); 0" l 
(* query) 0 1 2 3 
Start Starting Location 
(a) Running example in pseudo-code. (b) Resulting histogram. 


Fig. 1: Inferring the starting point of a random walk on R>oọ, in a PPL. 


value is our query: it indicates that we are interested in the posterior distribution 
on the starting point. 

The histogram in Fig. 1b is obtained by sampling repeatedly from the pos- 
terior of a Python model of our running example. It shows the mode of the 
pedestrian’s starting point to be around the 0.8km mark. 

To approximate the posterior, inference engines for probabilistic programs 
often proceed indirectly and operate on the space of program traces, rather than 
on the space of possible return values. By trace, we mean the sequence of sam- 
ples drawn in the course of a particular run, one for each random primitive 
encountered. Because each random primitive (qua probability distribution) in 
the language comes with a density, given a particular trace we can compute a 
coefficient as the appropriate product. We can then multiply this coefficient by 
all scores encountered in the execution, and this yields a (weight) function, map- 
ping traces to the non-negative reals, over which the chosen inference algorithm 
may operate. This indirect approach is more practical, and enough to answer 
the query, since every trace unambiguously induces a return value. 


Remark 1. In much of the probabilistic programming literature (e.g. [31,55,54], 
including this paper), the above-mentioned weight function on traces is called the 
density of the probabilistic program. This may be confusing: as we have seen, 
a probabilistic program induces a posterior probability distribution on return 
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values, and it is natural to ask whether this distribution admits a probability 
density function (Radon-Nikodym derivative) w.r.t. some base measure. This 
problem is of current interest [2,3,21] but unrelated to the present work. 


2.3 Gradient-Based Approximate Inference 


Some of the most influential and practically important inference algorithms make 
use of the gradient of the density functions they operate on, when these are 
differentiable. Generally the use of gradient-based techniques allow for much 
greater efficiency in inference. 

A popular example is the Markov Chain Monte Carlo algorithm known as 
Hamiltonian Monte Carlo (HMC) [15,37]. Given a density function g : X —> 
R, HMC samples are obtained as the states of a Markov chain by (approxi- 
mately) simulating Hamilton’s equations via an integrator that uses the gra- 
dient V. g(a). Another important example is (stochastic) variational inference 
[18,40,6,27], which transforms the posterior inference problem to an optimisa- 
tion problem. This method takes two inputs: the posterior density function of 
interest g : X > R, and a function h : O x X > R; typically, the latter function 
is a member of an expressive and mathematically well-behaved family of densi- 
ties that are parameterised in ©. The idea is to use stochastic gradient descent 
to find the parameter 0 € O that minimises the “distance” (typically the Kull- 
back—Leibler divergence) between h(@,—) and g, relying on a suitable estimate 
of the gradient of the objective function. When g is the density of a probabilistic 
program (the model), h can be specified as the density of a second program (the 
guide) whose traces have additional 6-parameters. The gradient of the objective 
function is then estimated in one approach (score function [41]) by computing 
the gradient Vg h(6,x), and in another (reparameterised gradient [24,42,49]) by 
computing the gradient Vz g(x). 

In probabilistic programming, the above inference methods must be adapted 
to deal with the fact that in a universal PPL, the set of random primitives 
encountered can vary between executions, and traces can have arbitrary and un- 
bounded dimension; moreover, the density function of a probabilistic program is 
generally not (everywhere) differentiable. Crucially these adapted algorithms 
are only valid when the input densities are almost everywhere differentiable 
[55,38,32]; this is the subject of this paper. 


Our main result (Thm. 3) states that the weight function and value function 
of almost surely terminating SPCF programs are almost everywhere differen- 
tiable. This applies to our running example: the program in Fig. la (expressible 
in SPCF using primitive functions that satisfy Assumption 1 — see Ex. 1) is 
almost surely terminating. 


3 Sampling Semantics for Statistical PCF 


In this section, we present a simply-typed statistical probabilistic programming 
language with recursion and its operational semantics. 
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a,7Tu=Rlo>r 


M,N, L =y |r| f(Mi,...,Me) | dAy.M|MN|YM | if(L < 0, M,N) 


| sample | score( M) 


TEM:R PEM: (¢537T)3> (057) 
I H sample: R I H score(M):R TKYM:o057 


Fig. 2: Syntax of SPCF, where r € R, x,y are variables, and f : R” — R ranges 
over a set F of partial, measurable primitive functions (see Sec. 4.2). 


3.1 Statistical PCF 


Statistical PCF (SPCF) is higher-order probabilistic programming with re- 
cursion in purified form. The terms and part of the (standard) typing system of 
SPCF are presented in Fig. 2 °. In the rest of the paper we write x to represent a 
sequence of variables 71,...,2,,, A for the set of SPCF terms, and A° for the set 
of closed SPCF terms. In the interest of readability, we sometimes use pseudo 
code (e.g. Fig. 1a) in the style of Core ML to express SPCF terms. 

SPCF is a statistical probabilistic version of call-by-value PCF [46,47] with re- 
als as the ground type. The probabilistic constructs of SPCF are relatively stan- 
dard (see for example [48]): the sampling construct sample draws from U/(0, 1), 
the standard uniform distribution with end points 0 and 1; the scoring construct 
score( M) enables conditioning on observed data by multiplying the weight of the 
current execution with the (non-negative) real number denoted by M. Sampling 
from other real-valued distributions can be obtained from U(0,1) by applying 
the inverse of the distribution’s cumulative distribution function. 

Our SPCF is an (inconsequential) variant of CBV SPCF [51] and a (CBV) 
extension of PPCF [16] with scoring; it may be viewed as a simply-typed version 
of the untyped probabilistic languages of [8,13,52]. 


Example 1 (Running Example Ped). We express in SPCF the example in Fig. la. 


let x = sample - 3 in 
Ped = | let d = walk in where 
let w = score(pdf y(1.1,0.1)(d)) in x 


Afa. if x <0 then 0 
walk = Y let s = sample in 
a if ((sample < 0.5), (s + f(a + s)), (s + f(x — s))) 


The let construct, let x = N in M, is syntactic sugar for the term (Av.M) N; 
and pdf w(1.1,0.1); the density function of the normal distribution with mean 1.1 
and variance 0.1, is a primitive function. To enhance readability we use infix 
notation and omit the underline for standard functions such as addition. 


? In Fig. 2 and in other figures, we highlight the elements that are new or otherwise 
noteworthy. 
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3.2 Operational Semantics 


The execution of a probabilistic program generates a trace: a sequence contain- 
ing the values sampled during a run. Our operational semantics captures this 
dynamic perspective. This is closely related to the treatment in [8] which, follow- 
ing [26], views a probabilistic program as a deterministic program parametrized 
by the sequence of random draws made during the evaluation. 


Traces. Recall that in our language, sample produces a random value in the 
open unit interval; accordingly a trace is a finite sequence of elements of (0, 1). 
We define a measure space S of traces to be the set U,,¢,(0, 1)”, equipped 
with the standard disjoint union o-algebra, and the sum of the respective (higher- 
dimensional) Lebesgue measures. Formally, writing Sn := (0, 1)”, we define: 


S:= (U Si LJ UnlUn € xs.) +s] and us (U vn) = X Leb (Un). 


nEN nEN nEN nEN 


Henceforth we write traces as lists, such as [0.5, 0.999, 0.12]; the empty trace as 
[]; and the concatenation of traces s,s’ € S as s + s’. 

More generally, to account for open terms, we define, for each m € N, the 
measure space 


R” x S := (U R” x sel U Va | Vn E Bams, } xs) 


neN neN 


where Lipmys ( Unen Vn) = J nen Lebm+n(Vn). To avoid clutter, we will elide 
the subscript from purpmys whenever it is clear from the context. 


Small-Step Reduction. Next, we define the values (typically denoted V), 
redexes (typically R) and evaluation contexts (typically E): 


V =r | Ay.M 
R= (ày. M) V | £(r1,---; re) | Y(Ay. M) | if(r < 0, M, N) | sample | score(r) 
E := |] | EM | (y.-M) E | £(@1 -3 ri-1, E,Miy,...,Me) | YE 


Lif(E < 0, M,N) | score(E) 


We write A, for the set of SPCF values, and AÌ for the set of closed SPCF 
values. 

It is easy to see that every closed SPCF term M is either a value, or there 
exists a unique pair of context E and redex R such that M = E[R]. 

We now present the operational semantics of SPCF as a rewrite system of 
configurations, which are triples of the form (M,w,s) where M is a closed 
SPCF term, w € Ro is a weight, and s € S a trace. (We will sometimes refer to 
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Redex Contractions: 


(Ay. M) V, w, s) > (M[V/y], w, 8) 
(F(r1, Te), w, 8) > f(ra,.--sre),w, 8) (if (r1,...,7¢) E dom(f)) 
(f (ra, +5 Te), wW , 8) — fail (if (r1,...,7¢) Z dom(f)) 
(Y iy: M), w, s) > (Az.M|Y(Ay.M)/y] z, w, s) (for fresh variable z) 
(if(r < 0, M N), w ,8) > (M, w, 8) (if r <0) 
(if(r < 0, M,N),w w, s) > (N, w, 8) (if r > 0) 
(sample, w, s) > (r, w, s + [r]) (for some r € (0, 1)) 
(score(r), w, 8) > (r,r - w, s) (if r > 0) 
(score(r), w, 8) — fail (ifr < 0) 
Evaluation Contexts: 
(R,w, 8) > (R', w’, s’) (R, w, s) — fail 
(E[R], w, s) > (E[R'], w’, s’) (E[R], w, s} — fail 


Fig. 3: Operational small-step semantics of SPCF 


these as the concrete configurations, in contrast with the abstract configurations 
of our symbolic operational semantics, see Sec. 5.2.) 

The small-step reduction relation — is defined in Fig. 3. In the rule for sample, 
arandom value r € (0,1) is generated and recorded in the trace, while the weight 
remains unchanged: in a uniform distribution on (0,1) each value is drawn with 
likelihood 1. In the rule for score(r), the current weight is multiplied by non- 
negative r € R: typically this reflects the likelihood of the current execution 
given some observed data. Similarly to [8] we reduce terms which cannot be 
reduced in a reasonable way (i.e. scoring with negative constants or evaluating 
functions outside their domain) to fail. 


Example 2. We present a possible reduction sequence for the program in Ex. 1: 


let x = 0.2 - 3 in 


(Ped, 1, []) -* let d = walk x in 1021) 
let w = score(pdf y(1.1,0.1)(4)) in x 


let d = walk 0.6 in 
>* A et w= score(pdfy-(q.4,9,1)(d)) in 0.6 , 1, [0.2] 
ag (let w = score(pdf wq 1.0.1) (0-9)) in 0.6, 1, (0.2, 0.9, 0.7]) (x) 
—* (let w = score(0.54) in 0.6, 1, [0.2, 0.9, 0.7]) 
—* (0.6, 0.54, (0.2, 0.9, 0.7]) 


In this execution, the initial sample yields 0.2, which is appended to the trace. 
At step (x), we assume given a reduction sequence (walk 0.6, 1, [0.2]) > 
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(0.9, 1, [0.2,0.9,0.7]); this means that in the call to walk, 0.9 was sampled as 
the the step size and 0.7 as the direction factor; this makes the new location 
—0.3, which is negative, so the return value is 0.9. In the final step, we perform 
conditioning using the likelihood of observing 0.9 given the data 1.1: the score() 
expression updates the current weight using the the density of 0.9 in the normal 
distribution with parameters (1.1, 0.1). 


Value and Weight Functions. Using the relation +, we now aim to reason 
more globally about probabilistic programs in terms of the traces they produce. 
Let M be an SPCF term with free variables amongst £1,..., £m of type R. Its 
value function valuey : R™ x S + A°U{L} returns, given values for each free 
variable and a trace, the output value of the program, if the program terminates 
in a value. The weight function weighty : R™ x S —> R>o returns the final 
weight of the corresponding execution. Formally: 


Vif (M[r/a],1,[]) +* (V, w, s) 
| otherwise 


values (r, s) = 


w if (M[r/a], 1, []) >* (V, w, s) 


ighty(r, s) := 
w i f otherwise 


For closed SPCF terms M we just write weight,,;(s) for weight ([], s) (similarly 
for valueys), and it follows already from [8, Lemma 9] that the functions value 
and weight,; are measurable (see also Sec. 4.1). 

Finally, every closed SPCF term M has an associated value measure 


[M] : 240 —> R>o 


defined by [M](U) := Svatuen-2(U) dus weight,,;. This corresponds to the deno- 
tational semantics of SPCF in the w-quasi-Borel space model via computational 
adequacy [51]. 


Returning to Remark 1, what are the connections, if any, between the two 
types of density of a program? To distinguish them, let’s refer to the weight func- 
tion of the program, weight y, as its trace density, and the Radon-Nikodyn deriva- 
tive of the program’s value-measure, aM} where v is the reference measure of the 
measurable space X40, 3 the output density. Observe that, for any measurable 
function f : A° > [0, co] » Sho d[M] f= Svaluex}( HAD) dus ee (f ovalueyy) = 
Ja dus weight y: (f ovaluem) (because if s ¢ yale (X10) then weight (s) = 0). 
aM] 


It follows that we can express any expectation w.r.t. the output density as 
an expectation w.r.t. the trace density weight,,. If our aim is, instead, to a 
ate samples from aM] then we can simply generate samples from weighty, and 
deterministically convert each sample to the space (A°, X ‘Ao) Via the value func- 
tion value. In other words, if our intended output is just a sequence of samples, 
then our inference engine does not need to concern itself with the consequences 
of change of variables. 
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4 Differentiability of the Weight and Value Functions 


To reason about the differential properties of these functions we place ourselves in 
a setting in which differentiation makes sense. We start with some preliminaries. 


4.1 Background on Differentiable Functions 


Basic real analysis gives a standard notion of differentiability at a point a € IR” 
for functions between Euclidean spaces R” — R™. In this context a function 
f: R” — R” is smooth on an open U C R” if it has derivatives of all orders 
at every point of U. The theory of differential geometry (see e.g. the textbooks 
[50,29,28]) abstracts away from Euclidean spaces to smooth manifolds. We recall 
the formal definitions. 

A topological space M is locally Euclidean at a point x € M if x has 
a neighbourhood U such that there is a homeomorphism @¢ from U onto an 
open subset of R”, for some n. The pair (U,¢@ : U > R”) is called a chart (of 
dimension n). We say M is locally Euclidean if it is locally Euclidean at every 
point. A manifold M is a Hausdorff, second countable, locally Euclidean space. 

Two charts, (U,¢: U —> R”) and (V, 4 : V > R™), are compatible if the 
function Y% o ¢-1: (UNV) + y(U N V) is smooth, with a smooth inverse. An 
atlas on M is a family {(Ua, ¢da)} of pairwise compatible charts that cover M. 
A smooth manifold is a manifold equipped with an atlas. 

It follows from the topological invariance of dimension that charts that cover 
a part of the same connected component have the same dimension. We empha- 
sise that, although this might be considered slightly unusual, distinct connected 
components need not have the same dimension. This is important for our pur- 
poses: S is easily seen to be a smooth manifold since each connected compo- 
nent S; is diffeomorphic to R’. It is also straightforward to endow the set A 
of SPCF terms with a (smooth) manifold structure. Following [8] we view A 
as Umen (SKm x R”), where SK,, is the set of SPCF terms with exactly m 
place-holders (a.k.a. skeleton terms) for numerals. Thus identified, we give A 
the countable disjoint union topology of the product topology of the discrete 
topology on SK,, and the standard topology on R™. Note that the connected 
components of A have the form {M} x R™, with M ranging over SK,,, and m 
over N. So in particular, the subspace A, C A of values inherits the manifold 
structure. We fix the Borel algebra of this topology to be the o-algebra on A. 

Given manifolds (M,{Ua, a }) and (M’, {Vg,ve}), a function f : M => M’ 
is differentiable at a point x E€ M if there are charts (Ua, a) about x and 
(Vz, Yg) about f(x) such that the composite Yg o f o 3t restricted to the open 
subset ¢a(f~'(Vs) N Ua) is differentiable at ġa (x). 

The definitions above are useful because they allow for a uniform presen- 
tation. But it is helpful to unpack the definition of differentiability in a few 
instances, and we see that they boil down to the standard sense in real analysis. 
Take an SPCF term M with free variables amongst 21,...,2%m (all of type R), 
and (r,s) € R” x Sp. 
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— The function weight,, : R™® x S — R>o is differentiable at (r,s) € R” x S, 
just if its restriction weight y|rmxs„, : R” x Sp —> Rso is differentiable at 
(r,s). 

— In case M is of type R, valuem : R™ x S + A? U {L} is in essence a partial 
function R™ x S — R. Precisely valueys is differentiable at (r,s) just if for 
some open neighbourhood U C R™ x Sp of (r,s): 

1. valueņy(r’, s’) = L for all (r’, s’) € U; or 

2. valueys(r’,s’) Æ L for all (r’,s’) € U, and value), : U > R is dif- 
ferentiable at (r,s), where we define valuej,;(r’,s’) := r” whenever 
valuem(r’, s’) = r”. 


4.2 Why Almost Everywhere Differentiability Can Fail 


Conditional statements break differentiability. This is easy to see with an exam- 
ple: the weight function of the term 


if (sample < sample, score(1), score(0)) 


is exactly the characteristic function of {[s1, s2] € S | sı < s2}, which is not 
differentiable on the diagonal {[s, s] € Sə | s € (0, 1)}. 

This function is however differentiable almost everywhere: the diagonal is an 
uncountable set but has Leba measure zero in the space S2. Unfortunately, this 
is not true in general. Without sufficient restrictions, conditional statements also 
break almost everywhere differentiability. This can happen for two reasons. 


Problem 1: Pathological Primitive Functions. Recall that our definition 
of SPCF is parametrised by a set F of primitive functions. It is tempting in this 
context to take F to be the set of all differentiable functions, but this is too 
general, as we show now. Consider that for every f : R — R the term 


if (f (sample) < 0, score(1), score(0)) 


has weight function the characteristic function of {[s1] € S | f(s1) < 0}. This 
function is non-differentiable at every sı € S4 N ƏfT!(—o0,0]: in every neigh- 
bourhood of sı there are s and s{ such that f(s1) <0 and f(s/) > 0. One can 
construct a differentiable f for which this is not a measure zero set. (For exam- 
ple, there exists a non-negative function f which is zero exactly on a fat Cantor 
set, i.e., a Cantor-like set with strictly positive measure. See [43, Ex. 5.21].) 


Problem 2: Non-Terminating Runs. Our language has recursion, so we can 
construct a term which samples a random number, halts if this number is in 
QN [0,1], and diverges otherwise. In pseudo-code: 


let rec enumQ pqr = 
if (r = p/q) then (score 1) else 
if (r < p/q) then 
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enumQ p (q+1) r 
else 
enumQ (p+1) q r 
in enumQ 0 1 sample 


The induced weight function is the characteristic function of {[s1] € S | sı € Q}; 
the set of points at which this function is non-differentiable is 54, which has 
measure 1. 

We proceed to overcome Problem 1 by making appropriate assumptions on 
the set of primitives. We will then address Problem 2 by focusing on almost 
surely terminating programs. 


4.3 Admissible Primitive Functions 


One contribution of this work is to identify sufficient conditions for F. We will 
show in Sec. 6 that our main result holds provided: 


Assumption 1 (Admissible Primitive Functions). F is a set of partial, mea- 
surable functions Rf — R including all constant and projection functions which 
satisfies 


1. if f : RE + R and gi : R™ — R are elements of F for i = 1,...,0, then 
folgi: R” —R is in F 

2. if (f: RE +R) € F, then f is differentiable in the interior of dom( f) 

3. if (f : RÉ +R) € F, then Lebg(Of—1[0, 00)) = 0. 


Example 3. The following sets of primitive operations satisfy the above sufficient 
conditions. (See [34] for a proof.) 


1. The set Fı of analytic functions with co-domain R. Recall that a function 
f: RÉ — R” is analytic if it is infinitely differentiable and its multivari- 
ate Taylor expansion at every point x29 € Rf converges pointwise to f in a 
neighbourhood of zo. 

2. The set Fo of (partial) functions f : Rf — R such that dom(f) is open’, 
and f is differentiable everywhere in dom(f), and f~1(J) is a finite union of 
(possibly unbounded) rectangles* for (possibly unbounded) intervals J. 


Note that all primitive functions mentioned in our examples (and in partic- 
ular the density of the normal distribution) are included in both Fı and Fə. 

It is worth noting that both Fı and F> satisfy the following stronger (than 
Assumption 1.3) property: Leb, (f7!) = 0 for every interval I, for every prim- 
itive function f. 


3 This requirement is crucial, and cannot be relaxed. 
4 ie. a finite union of J; x --- x Ig for (possibly unbounded) intervals J; 
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4.4 Almost Sure Termination 


To rule out the contrived counterexamples which diverge we restrict attention to 
almost surely terminating SPCF terms. Intuitively, a program M (closed term 
of ground type) is almost surely terminating if the probability that a run of M 
terminates is 1. 

Take an SPCF term M with variables amongst 21,...,2m (all of type R), 
and set 


TMterm = {(r, 8) € R” x S| AV, w. (M[r/a],1,[]) 3* (V,w,s8)}. (2) 


Let us first consider the case of closed M € A? i.e. m = 0 (notice that the measure 
Lirmxs is not finite, for m > 1). As TM,tem now coincides with valuez; (A°), 
TM,term is a measurable subset of S. Plainly if M is deterministic (i.e. sample- 
free), then us(T m,term) = 1 if M converges to a value, and 0 otherwise. Generally 
for an arbitrary (stochastic) term M we can regard us(T M,term) as the probability 
that a run of M converges to a value, because of Lem. 1. 


Lemma 1. If M € 4? then us(T aterm) < 1. 


More generally, if M has free variables amongst x1,...,2m (all of type R), 
then we say that M is almost surely terminating if for almost every (instantiation 
of the free variables by) r € R™, M[r/a] terminates with probability 1. 

We formalise the notion of almost sure termination as follows. 


Definition 1. Let M be an SPCF term. We say that M terminates almost 
surely if 


1. M is closed and ju(T term) = u(valueṣș (A°)) = 1; or 

2. M has free variables amongst 21,...,2m (all of which are of type R), and 
there exists T € Xigm such that Leb,,(R™ \ T) = 0 and for each r € T, 
M{r/ax] terminates almost surely. 


Suppose that M is a closed term and M? is obtained from M by recursively 
replacing subterms score(L) with the term if (L < 0, Mair, L), where Nti is a term 


that reduces to fail such as 1/0. It is easy to see that for all s € S, (M?, 1, []) =* 
(V, 1,8) iff for some (unique) w € R>o, (M, 1, []) —>* (V, w, s}. Therefore, 


pry) = f dps weighty 


=i 
value, (Av) 


= us({sE€S| IV. (m, ii 1) —>* (V,1, 8}}) = us(T m term) 


Consequently, the closed term M terminates almost surely iff [M’] is a proba- 
bility measure. 


Remark 2. — Like many treatments of semantics of probabilistic programs in 
the literature, we make no distinction between non-terminating runs and 
aborted runs of a (closed) term M: both could result in the value semantics 
[M?] being a sub-probabilty measure (cf. [4]). 
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— Even so, current probabilistic programming systems do not place any restric- 
tions on the code that users can write: it is perfectly possible to construct 
invalid models because catching programs that do not define valid proba- 
bility distributions can be hard, or even impossible. This is not surprising, 
because almost sure termination is hard to decide: it is I79-complete in the 
arithmetic hierarchy [22]. Nevertheless, because a.s. termination is an im- 
portant correctness property of probabilistic programs (not least because of 
the main result of this paper, Thm. 3), the development of methods to prove 
a.s. termination is a hot research topic. 


Accordingly the main theorem of this paper is stated as follows: 


Theorem 3. Let M be an SPCF term (possibly with free variables of type R) 
which terminates almost surely. Then its weight function weighty and value 
function valuey are differentiable almost everywhere. 


5 Stochastic Symbolic Execution 


We have seen that a source of discontinuity is the use of if-statements. Our main 
result therefore relies on an in-depth understanding of the branching behaviour 
of programs. The operational semantics given in Sec. 3 is unsatisfactory in this 
respect: any two execution paths are treated independently, whether they go 
through different branches of an if-statement or one is obtained from the other 
by using slightly perturbed random samples not affecting the control flow. 

More concretely, note that although we have derived weightp,,4[0.2, 0.9, 0.7] = 
0.54 and valuepeg|0.2, 0.9, 0.7] = 0.6 in Ex. 2, we cannot infer anything about 
weightp,y (0.21, 0.91, 0.71] and valuepeg/0.21, 0.91, 0.71] unless we perform the cor- 
responding reduction. 

So we propose an alternative symbolic operational semantics (similar to the 
“compilation scheme” in [55]), in which no sampling is performed: whenever a 
sample command is encountered, we simply substitute a fresh variable a; for 
it, and continue on with the execution. We can view this style of semantics 
as a stochastic form of symbolic execution [12,23], i.e., a means of analysing a 
program so as to determine what inputs, and random draws (from sample) cause 
each part of a program to execute. 

Consider the term M = let x = sample - 3 in (walk x), defined using the func- 
tion walk of Ex. 1. We have a reduction path 


M = let (x = a, - 3) in (walk x) = walk (ay - 3) 


but at this point we are stuck: the CBV strategy requires a value for œ. We will 
“delay” the evaluation of the multiplication a, - 3; we signal this by drawing a 
box around the delayed operation: a []3. We continue the execution, inspecting 
the definition of walk, and get: 


M =* walk (a1 []3) >* N = if (a1 03 < 0,0, P) 
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where 


_ [let s = sample in 
ie F < 0.5), (s + walk(aı 03 + s)), (s + walk(aı 03 — »)) ' 


We are stuck again: the value of a, is needed in order to know which branch 
to follow. Our approach consists in considering the space S; = (0,1) of possible 
values for a1, and splitting it into {sı € (0,1) | 51-3 < 0} =@ and {sı € (0,1) | 
sı- 3 > 0} = (0,1). Each of the two branches will then yield a weight function 
restricted to the appropriate subspace. 

Formally, our symbolic operational semantics is a rewrite system of configura- 
tions of the form (M, w, U}, where M is a term with delayed (boxed) operations, 
and free “sampling” variables? a1,...,@,; U C Sn is the subspace of sampling 
values compatible with the current branch; and w : U — R>o is a function 
assigning to each s € U a weight w(s). In particular, for our running example® 


(M, Xf]. 1, So)) =>* KN, A[s1]. 1, (0, 1)). 


As explained above, this leads to two branches: 


a KO Ası]. 1,0) 


(N, Msi]-1,(0,1)) sy (P, Msi]. 1, (0, 1)) 


The first branch has reached a value, and the reader can check that the second 
branch continues as 


(P, [si]. 1, (0, 1))) =” 
(if (as < 0.5, a2 + walk(ai []3 + a2), a2 + walk(a1 [3 — a2)), A[s1, s2, $3]. 1, (0, 1)°) 


where a2 and a3 stand for the two sample statements in P. From here we proceed 
by splitting (0,1)? into (0,1) x (0,1) x (0,0.5] and (0,1) x (0,1) x (0.5,1) and 
after having branched again (on whether we have passed 0) the evaluation of 
walk can terminate in the configuration 

(az ES 0, A[s1, 52, s3]. 1, U) 


where U := {[s1, s2, 83] € S3 | s3 > 0.5 A s1 -3 — s2 < 0}. 
Recall that M appears in the context of our running example Ped. Using our 
calculations above we derive one of its branches: 


(Ped, A[]. 1, {[]})) =* (let w = score(pdf w(1.1,0.1)(02)) in a1 03, Msi, s2, $3]. 1, UY 


> (let w = score( pdf y(1.1,0.1) ((@2)) in a1 3, A[s1, 52, 53]. 1, U} 


=>* (let w = [pdf wq .1,0.1) (@2) in a1 E13, Af[s1, $2, 53]. pdf wa .1,0.1) ($2), U) 
=>" (a1 03, Msi, $2, 53]. pdf w(.1,0.1) ($2), U} 


5 Note that M may be open and contain other free “non-sampling” variables, usually 
denoted z1,..., £m- 
ê We use the meta-lambda-abstraction Ar. f(x) to denote the set-theoretic function 
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In particular the trace [0.2, 0.9, 0.7] of Ex. 2 lies in the subspace U. We can imme- 
diately read off the corresponding value and weight functions for all [s1, 52, 83] € 
U simply by evaluating the computation a,-3, which we have delayed until now: 


valueped [s1, S2, s3] = S81° 3 weightped [s1, S2, s3] = Pdf w(1.1,0.1) (s2) 


5.1 Symbolic Terms and Values 


We have just described informally our symbolic execution approach, which in- 
volves delaying the evaluation of primitive operations. We make this formal by 
introducing an extended notion of terms, which we call symbolic terms and 
define in Fig. 4a along with a notion of symbolic values. For this we assume 
fixed denumerable sequences of distinguished variables: a1, a2,..., used to rep- 
resent sampling, and x1, 22,... used for free variables of type R. Symbolic terms 
are typically denoted M, N, or £. They contain terms of the form |f\|(,...,%) 
for f : RÉ — R € F a primitive function, representing delayed evaluations, and 
they also contain the sampling variables œj. The type system is adapted in a 
straightforward way, see Fig. 4b. 

We use A(m,n) to refer to the set of well-typed symbolic terms with free 
variables amongst %1,...,U%m and Qj,...,Q@, (and all are of type R). Note that 
every term in the sense of Fig. 2 is also a symbolic term. 

Each symbolic term M € Aim) has a corresponding set of regular terms, 
accounting for all possible values for its sampling variables &1,...,@ņn and its 
(other) free variables 71,...,2. For r € R™ and s € Sy, we call partially eval- 
uated instantiation of M the term |M] (r,s) obtained from M[r/x, s/a] by 
recursively “evaluating” subterms of the form |f|(r1,...,1r¢) to f(r1,..., re), pro- 
vided (r1,...,r¢) E dom(f). In this operation, subterms of the form f(ri,...,7¢) 
are left unchanged, and so are any other redexes. |M] can be viewed as a 
partial function |M] : R” x S, — A and a formal definition is presented in 
Fig. 5b. (To be completely rigorous, we define for fixed m and n, partial func- 
tions |M |m n : R” x Sn — A for symbolic terms M whose distinguished variables 
are amongst 21,...,@m and a1,...,Q@,. M may contain other variables y, z,... 
of any type. Since m and n are usually clear from the context, we omit them.) 
Observe that for M € A(m,n) and (r,s) € dom |M], [M] (r,s) is a closed term. 


Example 4. Consider M = (Az. a1 []3) (score(pdf w(1.1,0.1)(@2))). Then, for r = 
[| and s = [0.2, 0.9, 0.7], we have |M] (r, s) = (Az. 0.6) (score(pdf w(1.1,0.1) (0-9))). 


More generally, observe that if [+ M : o and (r,s) € dom|M| then I H 
[M] (r,s) : o. In order to evaluate conditionals if(£ < 0,M,N) we need to 
reduce £ to a real constant, i.e., we need to have |£] (r,s) = r for some r € R. 
This is the case whenever £ is a symbolic value of type R, since these are built 
only out of delayed operations, real constants and distinguished variables x; or 
Qaj. Indeed we can show the following: 


Lemma 2. Let (r,s) € dom|M|. Then M is a symbolic value iff |M] (r,s) is 
a value. 
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V ::=r | z: | az | If(U,..-,%) | Ay. M 


M, N, L =V | y | f(M,...,Me) | MN | YM |if(£ < 0, M, N) | sample | score( m) 
(a) Symbolic values (typically V) and symbolic terms (typically M, N or £) 


IP Te GA, RETR 


IP TP Aig aces WZ) oI PFR Tra;:R 
n rEemM:R--- rE M:R 
Tyy:oF y:o TEr:R” Dr f(M,..., Me): R 
T y: F M:T rEM:o0o >rt FN:0 PEM: (0>7T)S>0>17 
PF Ay.M :0 > T DPEMAN:T TEFYM:o0>T 
Fre:R FPEM:oD +N: 0 TEM:R 
Prif(L<0,M,N):0 T H sample: R I F score(M) :R 


(b) Type system for symbolic terms 


R= (Ay. M) V | f(U,...,%) | Yay.) | if( V <0, M,N) | sample | score( V ) 
E= [] | EM | Oy.) E | L(y Ha, £. Misr, 96) | YE 
if (E < 0, M, N) | score(£) 


(c) Symbolic values (typically V), redexes (R) and reduction contexts (£). 


Fig. 4: Symbolic terms and values, type system, reduction contexts, and redexes. 
As usual f € F andr eéR. 


For symbolic values V : R and (r,s) € dom|VJ| we employ the notation 
||V|| (r,s) := r’ provided that | Y] (r, s) = r’. 

A simple induction on symbolic terms and values yields the following prop- 
erty, which is crucial for the proof of our main result (Thm. 3): 


Lemma 3. Suppose the set F of primitives satisfies Item 1 of Assumption 1. 


1. For each symbolic value V of type R, by identifying dom ||V|| with a subset 
of R™*™, we have ||V|| € F. 

2. If F also satisfies item 2 of Assumption 1 then for each symbolic term M, 
|M]: R” x S, — A is differentiable in the interior of its domain. 


5.2 Symbolic Operational Semantics 


We aim to develop a symbolic operational semantics that provides a sound and 
complete abstraction of the (concrete) operational trace semantics. The symbolic 
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dom |[f(4,..-,%)| = {(r, 8) € dom [7] N --- N dom |W] | (r1,..-,7¢) € dom(f), 
where ri = |n] (r,8),--- we = |v] (r,s)} 
dom |sample| := dom |z; | := dom |a,;| := dom |y] := dom |r’| := R” x Sn 
domf(%,..., Me) = dom | 
dom |Ay. M| := dom |YM| := dom |score(M)| := dom || 
L 
L 


dom |M N] := dom |M | N dom|N] 
dom |if(£ < 0, M, N) | := dom |£] N dom |M] N dom |N] 


| sample | 


FA, ..-,%)| (r, 5) = F(ris--- re) , where for 1 <i <L, [YJ (r,s) =r; 
[zi] (r, s) := ri 
[aj] (r,s) = s3 
ly] (r,s) := y 
|r’ | (r,s) = r 
| f(m, yA . , Me) | (r,s) := Fm] (r,s),.--,|Me| eS) 
|Ay. M] (r,s) = Ay. |M] (r,s) 
[LMN] (r,s) = ((M] (r, 8)) (LN] (r, s)) 
[LYM] (r,s) = Y(|M] (r, s)) 
LF(£ < 0, M, N) | (r, 8) = if( L£] (r, 85) < 0, [M] (r, 8), LN] (r, s)) 
(r,s) 
(r,s) 


|score( M) | 


(b) Definition of |-| on dom |-| 


Fig. 5: Formal definition of the instantiation and partial evaluation function |-| 


semantics is presented as a rewrite system of symbolic configurations, which 
are defined to be triples of the form M, w, U}, where for some m and n, M € 
Amn) U © dom|M] C R” x Sa is measurable, and w : R™ x S — Rso 
with dom(w) = U. Thus we aim to prove the following result (writing 1 for the 
constant function X(r, s). 1): 


Theorem 1. Let M be a term with free variables amongst 41,...,%m.- 


1. (Soundness). If (M,1,R™)) >* (V,w,U)) then for all (r,s) € U it holds 
weighty (r,s) = w(r,s) and valueys(r, s) = |V] (r,s). 

2. (Completeness). If r € R™ and (M[r/a}, 1, []) >* (V,w,s) then there exists 
(M, 1,R™) >* (V, w,U} such that (r,s) € U. 
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As formalised by Thm. 1, the key intuition behind symbolic configurations 
(M,w,U)) (that are reachable from a given (M, 1,R™))) is that, whenever M is 
a symbolic value: 


— M gives a correct local view of valueys (restricted to U), and 
— w gives a correct local view of weight), (restricted to U); 


moreover, the respective third components U (of the symbolic configurations 
(M, Ww, UJ) cover TM,term- 

To establish Thm. 1, we introduce symbolic reduction contexts and sym- 
bolic redexes. These are presented in Fig. 4c and extend the usual notions 
(replacing real constants with arbitrary symbolic values of type R). 

Using Lem. 2 we obtain: 


Lemma 4. If R is a symbolic redex and (r,s) € dom |R] then |R] (r,s) is a 
redex. 


The following can be proven by a straightforward induction: 
Lemma 5 (Subject Construction). Let M be a symbolic term. 


1. IfM is a symbolic value then for all symbolic contexts £ and symbolic redexes 
R, MF E[R]. 

2. If M = E [Ri] = E2[Ro] then Eı = Ez and Ry = Ro. 

3. If M is not a symbolic value and dom|M| # Ø then there exist E and R, 
such that M = E[R]. 


The partial instantiation function also extends to symbolic contexts £ in the 
evident way — we give the full definition in [34]. 
Now, we introduce the following rules for symbolic redex contractions: 


(Ay. M) V, w,U) = (M[V/y], w, U) 
CEV,- Ve), w, UY => MEN,- , Ve), w, dom |F.. vnu Y 
(Y (Ay. M), w, U) = (Az. M [Y (Ay. M) /y] z, w, U} 
(if(V < 0, M, "a w,U) => (M, w, |V (—c0,0] NU } 
(if (V < 0, M, N), w, U} > (N, w, |Y (0,00) AT ) 
(sample, w, U} > ( an41 , w’, U"\) (U CR™ x Sy) 


(score(V), w, U} = (V, IPI- w, IYI 0,00) NU ) 


In the rule for sample, U” = { (r, s+ [s']) | (r,s) EU As’ € (0,1)} and w'(r, s+ 
[s’]) := w(r, s); in the rule for score(V), (||V|| - w) (r, s) := ||V|| (r, s) - w(r, s). 
The rules are designed to closely mirror their concrete counterparts. Cru- 
cially, the rule for sample introduces a “fresh” sampling variable, and the two 
rules for conditionals split the last component U C R™ xS, according to whether 
||V|| (r,s) < 0 or ||V|| (r,s) > 0. The “delay” contraction (second rule) is intro- 
duced for a technical reason: ultimately, to enable item 1 (Soundness). Otherwise 
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it is, for example, unclear whether Ay. a, + 1 should correspond to Ay.0.5+ 1 or 
Ay. 1.5 for sı = 0.5. 

Finally we lift this to arbitrary symbolic terms using the obvious rule for 
symbolic evaluation contexts: 


(R, w,U) = (R', w, U’) 
KEIR], w, U) => (EIR], w, U") 
Note that we do not need rules corresponding to reductions to fail because 


the third component of the symbolic configurations “filters out” the pairs (r, s) 
corresponding to undefined behaviour. In particular, the following holds: 


Lemma 6. Suppose (M,w,U)) is a symbolic configuration and (M,w,U)) > 
(N, w',U"\). Then (N, w’, U’) is a symbolic configuration. 


A key advantage of the symbolic execution is that the induced computation 
tree is finitely branching, since branching only arises from conditionals, splitting 
the trace space into disjoint subsets. This contrasts with the concrete situation 
(from Sec. 3), in which sampling creates uncountably many branches. 


Lemma 7 (Basic Properties). Let (M,w,U)) be a symbolic configuration. 
Then 


1. There are at most countably distinct such U” that (M, w,U)) =* (N, w',U’). 
2. If (M, w,U)) >* (VY, wi, Ui) fori € {1,2} then U, = Uz or U NO Us = Í. 
3. If (M, w, U) = (£;[sample], wi, U;)) Jorrie {1,2} then Ui = Ua or UL N 
U =Í. 
Crucially, there is a correspondence between the concrete and symbolic se- 
mantics in that they can “simulate” each other: 
Proposition 1 (Correspondence). Suppose (M,w,U)) is a symbolic con- 
figuration, and (r,s) € U. Let M = |M] (r,s) and w := w(r, s). Then 
1. If (M, w, U} => (N, w',U'} and (r,s + s’) € U’ then 
(M, w, s) > ([N] (r,s + s’), w(r, s’), s + s’). 
2. If (M,w,s) > (N, w, s’) then there exists (M,w,U)) > (N, w',U") such 
that |N] (r, 8’) = N, w'(r,s’) =w and (r,s’) € U’. 


As a consequence of Lem. 2, we obtain a proof of Thm. 1. 


6 Densities of Almost Surely Terminating Programs are 
Differentiable Almost Everywhere 


So far we have seen that the symbolic execution semantics provides a sound and 
complete way to reason about the weight and value functions. In this section we 
impose further restrictions on the primitive operations and the terms to obtain 
results about the differentiability of these functions. 

Henceforth we assume Assumption 1 and we fix a term M with free variables 
amongst 71,...,U%m.- 

From Lem. 3 we immediately obtain the following: 
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Lemma 8. Let (M,w,U)) be a symbolic configuration such that w is differen- 
tiable on U and (OU) = 0. If (M,w,U) = (M',w',U") then w is differen- 
tiable on U’ and p(OU’) = 0. 


6.1 Differentiability on Terminating Traces 


As an immediate consequence of the preceding, Lem. 3 and the Soundness (item 1 
of Thm. 1), whenever (M, 1,R™) =>* (V,w,U)) then weight); and value, are 
differentiable everywhere in Ù. 

Recall the set Ty term of (r, s) € R™ xS from Eq. (1) for which M terminates. 
We abbreviate T M,term to Tterm and define 


Tterm = Tm,term = {(r, s) € R” x S| AV, w. (M[r/æ],1, [) —>* (V, w, s}} 
Te = Liv | IV, w. (M, 1, R”) =" (Vv, w,U)} 


By Completeness (item 2 of Thm. 1), Trem = U{U | 4V,w.(M,1,R™) =>* 
(V,w,U))}. Therefore, being countable unions of measurable sets (Lemmas 6 
and 7), Tterm and Tit, are measurable. 

By what we have said above, weighty, and valuem are differentiable every- 


where on Tint. Observe that in general, Tit, C Trem. However, 


H (Team \ Tin) = 1 U wb) < S> nV) =0 (2) 
U:(M,1,R™)>" U:(M,1,R™) >" 
(V,w,U) (V,w,U) 


The first equation holds because the U-indexed union is of pairwise disjoint sets. 
The inequality is due to (U \ U) C OU. The last equation above holds because 
each (OU) = 0 (Assumption 1 and Lem. 8). 

Thus we conclude: 


Theorem 2. Let M be an SPCF term. Then its weight function weighty and 
value function valuegy are differentiable for almost all terminating traces. 


6.2 Differentiability for Almost Surely Terminating Terms 


Next, we would like to extend this insight for almost surely terminating terms to 
suitable subsets of R™ x S, the union of which constitutes almost the entirety of 
R” xS. Therefore, it is worth examining consequences of almost sure termination 
(see Def. 1). 

We say that (r,s) € R” x S is maximal (for M) if (M[r/a],1,[]) -* 
(N,w,s) and for all s’ € S\ {[]} and N’, (N,w,s) A* (N’,w’,s + 8’). Intu- 
itively, s contains a maximal number of samples to reduce M[r/a]. Let Tmax be 
the set of maximal (r, s). 

Note that Trem C Tmax and there are terms for which the inclusion is strict 
(e.g. for the diverging term M = Y(Af. f), [| € Tmax but [| Z Trem). Besides, 
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Tpret 
Tstuck 


Maer 


Fig. 6: Illustration of how R™” x S — visualised as the entire rectangle — is par- 
titioned to prove Thm. 3. The value function returns L in the red dotted area 
and a closed value elsewhere (i.e. in the blue shaded area). 


Tmax is measurable because, thanks to Prop. 1, for every n € N, 


{(r,s) € R” x Sn | (M[r/x],1,[) 3* (N, w, s)} = LJ ARs Ss) 
U:(M,1,R™)=>* 
(Nw, U) 


and the RHS is a countable union of measurable sets (Lemmas 6 and 7). 
The following is a consequnce of the definition of almost sure termination 
and a corollary of Fubini’s theorem (see [34] for details): 


Lemma 9. If M terminates almost surely then (Tmax \ Tterm) = 0. 


Now, observe that for all (r,s) € R™ x S, exactly one of the following holds: 
1. (r,s) is maximal 
2. for a proper prefix s’ of s, (r, s’) is maximal 
3. (r,s) is stuck, because s does not contain enough randomness. 
Formally, we say (r,s) is stuck if (M[r/a], 1, []) >* (E[sample], w, s}, and we 
let Tstuck be the set of all (r,s) which get stuck. Thus, 


R” x = Trax U Tpref U T stuck 


where Tpref = {(r, s ++ 8 p) | (r,s) € Tmax ^A s’ # []}, and the union is disjoint. 
Defining Tit = U{U | (M, 1, R™} =>* (£[sample], w, U}} we can argue 
analogously to Eq. (2) that (Tstuck \ Tit’) = 0. 
Moreover, for Tt; := {(r, s + 8’) | (r, s) € Titt,, and [] 4 s’ € S} it holds 


pre term 


T pref \ = U {( r,s +s’) , 5) € Tmax \ Tea ASE Sn} 
nEN 


and hence, wT pref \ Tier) < < nen (T max \ Tin) S = 0. 
Finally, we define 


T= = int 


term 


U Tite U T 


stuck 
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Clearly, this is an open set and the situation is illustrated in Fig. 6. By what we 
have seen, 


u ((R™ x 5) \ T) = u(Trerm \ Din) + uT \ Tpref) + H(T stuck \ Ttek) =0 


Moreover, to conclude the proof of our main result Thm. 3 it suffices to note: 
1. weighty and valuejy are differentiable everywhere on Tift, (as for Thm. 2), 
and 
2. weight y(r, s) = 0 and valuen (r, s) = L for (r,s) € TH, UTE 


ref stuck* 


Theorem 3. Let M be an SPCF term (possibly with free variables of type R) 
which terminates almost surely. Then its weight function weighty, and value 
function valueyy are differentiable almost everywhere. 


We remark that almost sure termination was not used in our development 
until the proof of Lem. 9. For Thm. 3 we could have instead directly assumed the 
conclusion of Lem. 9; that is, almost all maximal traces are terminating. This 
is a strictly weaker condition than almost sure termination. The exposition we 
give is more appropriate: almost sure termination is a standard notion, and the 
development of methods to prove almost sure termination is a subject of active 
research. 

We also note that the technique used in this paper to establish almost ev- 
erywhere differentiability could be used to target another “almost everywhere” 
property instead: one can simply remove the requirement that elements of F are 
differentiable, and replace it with the desired property. A basic example of this 
is smoothness. 


7 Conclusion 


We have solved an open problem in the theory of probabilistic programming. 
This is mathematically interesting, and motivated the development of stochastic 
symbolic execution, a more informative form of operational semantics in this 
context. The result is also of major practical interest, since almost everywhere 
differentiability is necessary for correct gradient-based inference. 


Related Work. This problem was partially addressed in the work of Zhou et 
al. [55] who prove a restricted form of our theorem for recursion-free first-order 
programs with analytic primitives. Our stochastic symbolic execution is related 
to their compilation scheme, which we extend to a more general language. 

The idea of considering the possible control paths through a probabilistic 
programs is fairly natural and not new to this paper; it has been used towards 
the design of specialised inference algorithms for probabilistic programming, see 
[11,56]. To our knowledge, this is the first semantic formalisation of the concept, 
and the first time it is used to reason about whole-program density. 

The notions of weight function and value function in this paper are inspired 
by the more standard trace-based operational semantics of Borgström et al. [8] 
(see also [52,31]). 
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Mazza and Pagani [35] study the correctness of automatic differentiation 
(AD) of purely deterministic programs. This problem is orthogonal to the work 
reported here, but it is interesting to combine their result with ours. Specifically, 
we show a.e. differentiability whilst [35] proves a.s. correctness of AD on the 
differentiable domain. Combining both results one concludes that for a deter- 
ministic program, AD returns a correct gradient a.s. on the entire domain. Going 
deeper into the comparison, Mazza and Pagani propose a notion of admissible 
primitive function strikingly similar to ours: given continuity, their condition 2 
and our condition 3 are equivalent. On the other hand we require admissible 
functions to be differentiable, when they are merely continuous in [35]. Finally, 
we conjecture that “stable points”, a central notion in [35], have a clear counter- 
part within our framework: for a symbolic evaluation path arriving at (V,w,U), 
for V a symbolic value, the points of U are precisely the stable points. 

Our work is also connected to recent developments in differentiable program- 
ming. Lee et al. [30] study the family of piecewise functions under analytic parti- 
tion, or just “PAP” functions. PAP functions are a well-behaved family of almost 
everywhere differentiable functions, which can be used to reason about automatic 
differentiation in recursion-free first-order programs. An interesting question is 
whether this can be extended to a more general language, and whether densities 
of almost surely terminating SPCF programs are PAP functions. (See also [19,9] 
for work on differentiable programs without conditionals.) 

A similar class of functions is also introduced by Bolte and Pauwels [7] in very 
recent work; this is used to prove a convergence result for stochastic gradient 
descent in deep learning. Whether this class of functions can be used to reason 
about probabilistic program densities remains to be explored. 

Finally we note that open logical relations [1] are a convenient proof technique 
for establishing properties of programs which hold at first order, such as almost 
everywhere differentiability. This approach remains to be investigated in this 
context, as the connection with probabilistic densities is not immediate. 


Further Directions. This investigation would benefit from a denotational 
treatment; this is not currently possible as existing models of probabilistic pro- 
gramming do not account for differentiability. 

In another direction, it is likely that we can generalise the main result by 
extending SPCF with recursive types, as in [51], and, more speculatively, first- 
class differential operators as in [17]. It would also be useful to add to SPCF a 
family of discrete distributions, and more generally continuous-discrete mixtures, 
which have practical applications [36]. 

Our work will have interesting implications in the correctness of various 
gradient-based inference algorithms, such as the recent discontinuous HMC [39] 
and reparameterisation gradient for non-differentiable models [32]. But given the 
lack of guarantees of correctness properties available until now, these algorithms 
have not yet been developed in full generality, leaving many perspectives open. 
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Abstract. Graded type theories are an emerging paradigm for aug- 
menting the reasoning power of types with parameterizable, fine-grained 
analyses of program properties. There have been many such theories 
in recent years which equip a type theory with quantitative dataflow 
tracking, usually via a semiring-like structure which provides analysis on 
variables (often called ‘quantitative’ or ‘coeffect’ theories). We present 
Graded Modal Dependent Type Theory (GRTT for short), which equips 
a dependent type theory with a general, parameterizable analysis of the 
flow of data, both in and between computational terms and types. In 
this theory, it is possible to study, restrict, and reason about data use in 
programs and types, enabling, for example, parametric quantifiers and 
linearity to be captured in a dependent setting. We propose GRTT, study 
its metatheory, and explore various case studies of its use in reasoning 
about programs and studying other type theories. We have implemented 
the theory and highlight the interesting details, including showing an 
application of grading to optimising the type checking procedure itself. 


1 Introduction 


The difference between simply-typed, polymorphically-typed, and dependently- 
typed languages can be characterised by the dataflow permitted by each type 
theory. In each, dataflow can be enacted by substituting a term for occurrences 
of a variable in another term, the scope of which is delineated by a binder. In 
the simply-typed A-calculus, data can only flow in ‘computational’ terms; com- 
putations and types are separate syntactic categories, with variables, bindings 
(A), and substitution—and thus dataflow—only at the computational level. In 
contrast, polymorphic calculi like System F [26,52] permit dataflow within types, 
via type quantification (V), and a limited form of dataflow from computations to 
types, via type abstraction (A) and type application. Dependently-typed calculi 
(e.g., [14,40,41,42]) break down the barrier between computations and types fur- 
ther: variables are bound simultaneously in types and computations, such that 
data can flow both to computations and types via dependent functions (JI) and 
application. This pervasive dataflow enables the Curry-Howard correspondence 
to be leveraged for program reasoning and theorem proving [59]. However, un- 
restricted dataflow between computations and types can impede reasoning and 
can interact poorly with other type theoretic ideas. 
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Firstly, System F allows parametric reasoning and notions of representa- 
tion independence [53,57], but this is lost in general in dependently-typed lan- 
guages when quantifying over higher-kinded types [45] (rather than just ‘small’ 
types [7,36]). Furthermore, unrestricted dataflow impedes efficient compilation 
as compilers do not know, from the types alone, where a term is actually needed. 
Additional static analyses are needed to recover dataflow information for opti- 
misation and reasoning. For example, a term shown to be used only for type 
checking (not flowing to the computational ‘run time’ level) can be erased [9]. 
Thus, dependent theories do not expose the distinction between proof relevant 
and irrelevant terms, requiring extensions to capture irrelevance [4,50,51]. Whilst 
unrestricted dataflow between computations and terms has its benefits, the per- 
missive nature of dependent types can hide useful information. This permissive- 
ness also interacts poorly with other type theories which seek to deliberately 
restrict dataflow, notably linear types. 

Linear types allow data to be treated as a ‘resource’ which must be consumed 
exactly once: linearly-typed values are restricted to linear dataflow [27,58,60]. 
Reasoning about resourceful data has been exploited by several languages, e.g., 
ATS [54], Alms [56], Clean [18], Granule [46], and Linear Haskell [8]. However, 
linear dataflow is rare in a dependently-typed setting. Consider typing the body 
of the polymorphic identity function in Martin-Lof type theory: 


a: Type,x:aFxz:a 


This judgment uses a twice (typing x in the context and the subject of the judg- 
ment) and x once in the term but not at all in the type. There have been vari- 
ous attempts to meaningfully reconcile linear and dependent types [12,15,37,39] 
usually by keeping them separate, allowing types to depend only on non-linear 
variables. All such theories cannot distinguish variables used for computation 
from those used purely for type formation, which could be erased at runtime. 

Recent work by McBride [43], refined by Atkey [6], generalises ideas from 
‘coeffect analyses’ (variable usage analyses, like that of Petricek et al. [49]) to a 
dependently-typed setting to reconcile the ubiquitous flow of data in dependent 
types with the restricted dataflow of linearity. This approach, called Quantitative 
Type Theory (QTT), types the above example as: 


a? Type,riaF rta 


The annotation 0 on a explains that we can use a to form a type, but we 
cannot, or do not, use it at the term level, thus it can be erased at runtime. The 
cornerstone of QTT’s approach is that dataflow of a term to the type level counts 
as 0 use, so arbitrary type-level use is allowed whilst still permitting quantitative 
analysis of computation-level dataflow. Whilst this gives a useful way to relate 
linear and dependent types, it cannot however reason about dataflow at the type- 
level (all type-level usage counts as 0). Thus, for example, QTT cannot express 
that a variable is used just computationally but not at all in types. 

In an extended abstract, Abel proposes a generalisation of QTT to track vari- 
able use in both types and computations [2], suggesting that tracking in types 


464 B. Moon et al. 


enables type checking optimisations and increased expressivity. We develop a 
core dependent type theory along the same lines, using the paradigm of grading: 
graded systems augment types with additional information, capturing the struc- 
ture of programs [23,46]. We therefore name our approach Graded Modal Depen- 
dent Type Theory (GRTT for short). Our type theory is parameterised by a semir- 
ing which, like other coeffect and quantitative approaches [3,6,10,25,43,49,61], 
describes dataflow through a program, but in both types and computations equally, 
remedying QTT’s inability to track type-level use. We extend Abel’s initial idea 
by presenting a rich language, including dependent tensors, a complete metathe- 
ory, and a graded modality which aids the practical use of this approach (e.g., 
enabling functions to use components of data non-uniformly). The result is a 
calculus which extends the power of existing non-dependent graded languages, 
like Granule [46], to a dependent setting. 

We begin with the definition of GRTT in Section 2, before demonstrating the 
power of GRTT through case studies in Section 3, where we show how to use 
grading to restrict GRTT terms to simply-typed reasoning, parametric reasoning 
(regaining universal quantification smoothly within a dependent theory), exis- 
tential types, and linear types. The calculus can be instantiated to different kinds 
of dataflow reasoning: we show an example application to information-flow secu- 
rity. We then show the metatheory of GRTT in Section 4: admissibility of graded 
structural rules, substitution, type preservation, and strong normalisation. 

We implemented a prototype language based on GRTT called Gerty.*? We 
briefly mention its syntax in Section 2.5 for use in examples. Later, Section 5 
describes how the formal definition of GRTT is implemented as a bidirectional 
type checking algorithm, interfacing with an SMT solver to solve constraints 
over grades. Furthermore, Abel conjectured that a quantitative dependent the- 
ory could enable usage-based optimisation of type-checking itself [2], which would 
assist dependently-typed programming at scale. We validate this claim in Sec- 
tion 5 showing a grade-directed optimisation to Gerty’s type checker. 

Section 6 discusses next steps for increasing the expressive power of GRTT. 
Full proofs and details are provided in the extended version of this paper [44]. 

Gerty has some similarity to Granule [46]: both are functional languages 
with graded types. However, Granule has a linearly typed core and no dependent 
types (only indexed types), thus has no need for resource tracking at the type 
level (type indices are not subject to tracking and their syntax is restricted). 


2 GrTT: Graded Modal Dependent Type Theory 


GRTT augments a standard presentation of dependent type theory with ‘grades’ 
(elements of a semiring) which account for how variables are used, i.e., their 
dataflow. Whilst existing work uses grades to describe usage only in computa- 
tional terms (e.g. [10]), GRTT incorporates additional grades to account for how 
variables are used in types. We introduce here the syntax and typing, and briefly 
show the syntax of the implementation. Section 4 describes its metatheory. 


3 https://github.com/granule- project /gerty /releases/tag /esop2021 
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2.1 Syntax 


The syntax of GRTT is that of a standard Martin-Lof type theory, with the 
addition of a graded modality and grade annotations on function and tensor 
binders. Throughout, s and r range over grades, which are elements of a semiring 
(R, *,1,+,0). It is instructive to instantiate this semiring to the natural number 
semiring (N, x,1,+,0), which captures the exact number of times variables are 
used. We appeal to this example in descriptions here. 

GRTT has a single syntactic sort for computations and types: 


(terms) t,A,B,Cu=a | Type, 
| (a ep A) > B | Aw.t | tı te 
| (x: A)@B | (ti, ta) | let (x,y) = tı in t2 
| “A | Ot | let Oa = ty inte 
(levels) l ::= 0 | suc l | h Ula 


Terms include variables and a constructor for an inductive hierarchy of universes, 
annotated by a level l. Dependent function types are annotated with a pair of 
grades s and r, with s capturing how x is used in the body of the inhabiting 
function and r capturing how z is used in the codomain B. Dependent tensors 
have a single grade r, which describes how the first element is used in the typing 
of the second. The graded modal type operator O, A ‘packages’ a term and its 
dependencies so that values of type A can be used with grade s in the future. 
Graded modal types are introduced via promotion Lit and eliminated via let Ox = 
tı intg. The following sections explain the semantics of each piece of syntax with 
respect to its typing. We typically use A and B to connote terms used as types. 


2.2 Typing Judgments, Contexts, and Grading 


Typing judgments are written in either of the following two equivalent forms: 
A 
(Alo, |o2)OL Ft: A (a)orrt:A 
o2 


The ‘horizontal’ syntax (left) is used most often, with the equivalent ‘vertical’ 
form (right) used for clarity in some places. Ignoring the part to the left of ©, 
typing judgments and their rules are essentially those of Martin-Löf type theory 
(with the addition of the modality) where T ranges over usual dependently-typed 
typing contexts. The left of © provides the grading information, where o and A 
range over grade vectors and context grade vectors respectively, of the form: 


(contexts) (grade vectors) (context grade vectors) 
P:=0|0,c:A o:=loa,s A := 0] A,o 


A grade vector ø is a vector of semiring elements, and a context vector A is a 
vector of grade vectors. We write (s1, ..., Sn) to denote an n-vector and likewise 
for context grade vectors. We omit parentheses when this would not cause ambi- 
guity. Throughout, a comma is used to concatenate vectors and disjoint contexts, 
and to extend vectors with a single grade, grade vector, or typing assumption. 
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For a judgment (A | os | o) OI F t: A the vectors T, A, os, and o, are 
all of equal size. Given a typing assumption y : B at index 7 in I, the grade 
asli] E€ R denotes the use of y in t (the subject of the judgment), the grade 
arli] € R denotes the use of y in A (the subject’s type), and Afi] € R* (of size i) 
describes how assumptions prior to y are used to form y’s type, B. 

Consider the following example, which types the body of a function that 
takes two arguments of type a, and returns only the first: 


(Orie) ©Qa:Type,cx:a,y:ara:a 
1,0,0 
Let the context grade vector be called A. Then, A[0] = () (empty vector) explains 
that there are no assumptions that are used to type a in the context, as Type, 
is a closed term and the first assumption. A[1] = (1) explains that the first 
assumption a is used (grade 1) in the typing of x in the context, and A[2] = (1,0), 
explains that a is used once in the typing of y in the context, and x is unused in 
the typing of y. The subject grade vector a, = (0, 1,0) explains that a is unused 
in the subject, x is used once, and y is unused. Finally, the subject type vector 
o, = (1,0,0) explains that a appears once in the subject’s type (which is just 
a), and x and y are unused in the formation of the subject’s type. 

To aid reading, recall that standard typing rules typically have the form 
context + subject : subject-type, the order of which is reflected by (A | os | o,)©... 
giving the context, subject, and subject-type grading respectively. 


Well-formed Contexts The relation AOT F identifies a context I as well-formed 
with respect to context grade vector A, defined by the following rules: 


aay (Alo |O)OLFA: Type, 


E 
POOF A,oOl,x:At More 


Unlike typing, well-formedness does not need to include subject and subject-type 
grade vectors, as it considers only the well-formedness of the assumptions in a 
context with respect to prior assumptions in the context. The wF@ rule states 
that the empty context is well-formed with an empty context grade vector as 
there are no assumptions to account for. The WFEXT rule states that given A 
is a type under the assumptions in I’, with ø accounting for the usage of I" 
variables in A, and A accounting for usage within I’, then we can form the well- 
formed context I',x : A by extending A with o to account for the usage of A 
in forming the context. The notation 0 denotes a vector for which each element 
is the semiring 0. Note that the well-formedness A © I+ is inherent from the 
premise of WFEXT due to the following lemma: 


Lemma 1 (Typing contexts are well-formed). If(A|o,|o2)OLrFt:A 
then AOI F. 


2.3 Typing Rules 


We examine the typing rules of GRTT one at a time. 
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Variables are introduced as follows: 


(41,0, 4&2) O Tj,£: A, D2 F |Ai| = Al 


VAR 
(41,0, As | 04l 1,0 | 0,0,0) OD, x: A, Hæ: A 


The premise identifies T1, x : A, T> as well-formed under the context grade vector 
Ai, 0, Ag. By the size condition |41| = ||, we are able to identify ø as capturing 
the usage of the variables I, in forming A. This information is used in the 
conclusion, capturing type-level variable usage as ø, 0,0, which describes that I 
is used according to ø in the subject’s type (A), and that the x and the variables 
of I) are used with grade 0. For subject usage, we annotate the first zero vector 
with a size |A,|, allowing us to single out x as being the only assumption used 
with grade 1 in the subject; all other assumptions are used with grade 0. 
For example, typing the body of the polymorphic identity ends with VAR: 


OW) Sa: Type ear T O] = Ja: Type 


((Q,(1)) | 0,1] 1,0) Oa: Type, x :aF z: a 


The premise implies that ((),1,0)©a : Type F a: Type by the following lemma: 


Lemma 2 (Typing an assumption in a well-formed context). If A1, 0, Ae 
OTi, x: A, I> F with |A| = Ti], then (Aı |o | O OT F A: Type, for some l. 


In the conclusion of VAR, the typing ((),1,0)©a : Type F a: Type is ‘distributed’ 
to the typing of x in the context and to the formation the subject’s type. Thus 
subject grade (0,1) corresponds to the absence of a from the subject and the 
presence of x, and subject-type grade (1,0) corresponds to the presence of a in 
the subject’s type (a), and the absence of z. 

Typing universes are formed as follows: 


AOret 
(A|0]|0)© T F Type, : Type 


Type 

suc l 

We use an inductive hierarchy of universes [47] with ordering < such that 
l < suc J. Universes can be formed under any well-formed context, with every 
assumption graded with 0 subject and subject-type use, capturing the absence 
of any assumptions from the universes, which are closed forms. 


Functions Function types (£ :(sr) A) — B are annotated with two grades: 
explaining that x is used with grade s in the body of the inhabiting function 
and with grade r in B. Function types have the following formation rule: 


(Alo, |O)OLF A: Type, (A,o1| 02,7 |0) OT, x: AF B: Type, 
(A |o +02 |0)OLE (2z (sr) A) > B : Type, ui, 


The usage of the dependencies of A and B (excepting x) are given by gı and o2 
in the premises (in the ‘subject’ position) which are combined as o1 + o2 (via 
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pointwise vector addition using the + of the semiring), which serves to contract 
the dependencies of the two types. The usage of x in B is captured by r, and 
then internalised to the binder in the conclusion of the rule. An arbitrary grade 
for s is allowed here as there is no information on how zg is used in an inhabiting 
function body. Function terms are then typed by the following rule: 


(A,o1|o3,r|O)OLT,x: AF B: Type, (4,01 | 02,8 | 03,r)OT,x:AFt:B X 
(A | o2 |o +03) OT F Azt: Care A)> B 


The second premise types the body of the A-term, showing that s captures the 

usage of x in t and r captures the usage of x in B; the subject and subject-type 

grades of x are then internalised as annotations on the function type’s binder. 
Dependent functions are eliminated through application: 


(A,o1 | 03,7 |0) OT, x: AF B: Type, 
(A|o2|o1+03)O PF: (£s A) AB (Alos|oi)OPb ta: A 
(A | o2 +s x* 04 |03 +r*o) OTF te: [t2/x]B 


where * is the scalar multiplication of a vector, using the semiring multiplication. 
Given a function tı which uses its parameter with grade s to compute and with 
grade r in the typing of the result, we can apply it to a term tz, provided that 
we have the resources required to form tz scaled by s at the subject level and by 
r at the subject-type level, since tg is substituted into the return type B. This 
scaling behaviour is akin to that used in coeffect calculi [25,49], QTT [6,43] and 
Linear Haskell [8], but scalar multiplication happens here at both the subject and 
subject-type level. The use of variables in A is accounted for by a, as explained 
in the third premise, but these usages are not present in the resulting application 
since A no longer appears in the types or the terms. 

Consider the constant function Ax.Ay.a : (x 21,9) A) > (Y 20,0) B) > A (for 
some A and B). Here the resources required for the second parameter will always 
be scaled by 0, which is absorbing, meaning that anything passed as the second 
argument has 0 subject and subject-type use. This example begins to show some 
of the power of grading—the grades capture the program structure at all levels. 


Tensors The rule for forming dependent tensor types is as follows: 


(Alo, |0)OLEA: Type, Amada OTa TA ee 
(A|oi+0o2| 0 OTF (z: A) Q B : Type, 


This rule is almost identical to function type formation > but with only a single 

grade r on the binder, since x is only bound in B (the type of the second com- 

ponent), and not computationally. For ‘quantitative’ semirings, where 0 really 

means unused (see Section 3), (x : A) ® B is then a product A x B. 
Dependent tensors are introduced as follows: 


(A,o1 | 03,7 |0) OT, x: AF B: Type, 
(Aloo|am)OLrFt:A (A |o4|o3 +r*o) OTF t: [ty/2]B i: 
(A | o2 +04 | 01 +03) OTF (ti, te): (£: A)@B ‘ 
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In the typing premise for t2, occurrences of x are replaced with tı in the type, 

ensuring that the type of the second component (t2) is calculated using the 

first component (tı). The resources for tı in this substitution are scaled by r, 

accounting for the existing usage of x in B. In the conclusion, we see the resources 

for the two components (and their types) combined via the semiring addition. 
Finally, tensors are eliminated with the following rule: 


(A| os| oi +02) OTF ti: (£ AOB 

(A, (01 +02) | o5, r| 0) OT, z: (a:, A) BEC: Type, 

(4,01, (02,r) | 04,58,58 | 05,7’, r) OT, æ: A,y : BF te: [(£,y)/2]C 
(A|o4t+s%*03|o05 +r'*o3) OTF let(z,y) = ti into : [t1/z2C € 


As this is a dependent eliminator, we allow the result type C to depend upon 
the value of the tensor as a whole, bound as z in the second premise with grade 
r’, into which is substituted our actual tensor term tı in the conclusion. 
Eliminating a tensor (tı) requires that we consider each component (x and 
y) is used with the same grade s in the resulting expression t2, and that we scale 
the resources of tı by s. This is because we cannot inspect t; itself, and semiring 
addition is not injective (preventing us from splitting the grades required to 
form tı). This prevents forming certain functions (e.g., projections) under some 
semirings, but this can be overcome by the introduction of graded modalities. 


Graded Modality Graded binders alone do not allow different parts of a value 
to be used differently, e.g., computing the length of a list ignores the elements, 
projecting from a pair discards one component. We therefore introduce a graded 
modality (& la [10,46]) allowing us to capture the notion of local inspection on 
data and internalising usage information into types. A type DA denotes terms 
of type A that are used with grade s. Type formation and introduction rules are: 


(A|a|0)OLEA: Type, (Aloilo2)OLFFt:A 
(A|o|O0)OLFFO,A: Type, (Alsxq, |o Orr uwt: O LA 


To form a term of type OA, we ‘promote’ a term t of type A by requiring that 
we can use the resources used to form t (o1) according to grade s. This ‘promo- 
tion’ resembles that of other graded modal systems (e.g., [3,10,23,46]), but the 
elimination needs to also account for type usage due to dependent elimination. 

We can see promotion L; as capturing t for later use according to grade s. 
Thus, when eliminating a term of type Os A, we must consider how the ‘unboxed’ 
term is used with grade s, as per the following dependent eliminator: 


(4,02 | o4,r |0) OT,z:O:AF B : Type, 
(Alor|o2)OPFtH:OsA (4,02 | 03,8 |04, (s*r)) OT,x: AF tə: [Ox/z]B 


(A | c1 +03 | oa +r*oi) OTF letOz = ti inte : [t1/z]B i 


This rule can be understood as a kind of ‘cut’, connecting a ‘capability’ to use 
a term of type A according to grade s with the requirement that x : A is used 
according to grade s as a dependency of t2. Since we are in a dependently-typed 


470 B. Moon et al. 


setting, we also substitute tı into the type level such that B can depend on 
tı according to grade r which then causes the dependencies of tı (01) to be 
scaled-up by r and added to the subject-type grading. 


Equality, Conversion, and Subtyping A key part of dependent type theories is 
a notion of term equality and type conversion [33]. GRTT term equality is via 
judgments (A | cı | o2) O T F tı = t2 : A equating terms tı and tz of type A. 
Equality includes full congruences as well as 8n-equality for functions, tensors, 
and graded modalities, of which the latter are: 

(A, o2 | o4,r |0) OT,z:O,AF B : Type, 

(Ajlo) OLrFt:A (4,02 |03,s |04, (s*r)) OT,x: AF te: |[Ox/z]B 
(Ales tsso lots rri o) OTF (etr = Oi int) Saas: cae eT 


(Aloi1|o.)OrFt:O3A 7 
(A|oi|o2) OF Ft = (let Or =tine):0,A O0 


A subtyping relation ((A | o)© I+ A < B) subsumes equality, adding ordering 
of universe levels. Type conversion allows re-typing terms based on the judgment: 


(Alo,|o2)OFFt:A (Alog)OFFAK<KB 
(Alo, |o2)OFFt:B 
The full rules for equality and subtyping are in this paper’s extended version [44]. 


CONV 


2.4 Operational Semantics 


As with other graded modal calculi (e.g., [3,10,23]), the core calculus of GRTT 
has a Call-by-Name small-step operational semantics with reductions t ~> t. 
The rules are standard, with the addition of the $-rule for the graded modality: 


let Ox = Ot; ints ~ [1/2] t2 (80) 


Type preservation and normalisation are considered in Section 4. 


2.5 [Implementation and Examples 


To explore our theory, we provide an implementation, Gerty. Section 5 describes 
how the declarative definition of the type theory is implemented as a bidirectional 
type checking algorithm. We briefly mention the syntax here for use in later 
examples. The following is the polymorphic identity function in Gerty: 


id: (a: (.0, .2) Type 0) -> (x: (.1, .0) a) -> a 
id = \a -> \x -> x 


The syntax resembles the theory, where grading terms .n are syntactic sugar for 
a unary encoding of grades in terms of 0 and repeated addition of 1, e.g., .2 = 
(.0 + .1) + .1. This syntax can be used for grade terms of any semiring, which 
can be resolved to particular built-in semirings at other points of type checking. 

The following shows first projection on (non-dependent) pairs, using the 
graded modality (at grade 0 here) to give fine-grained usage on compound data: 
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fst : (a: (.0, .2) Type 0) (b : (.0, .1) Type 0) -> <a * [.0] b> > a 
fst = \a b p -> case p of <x, y> -> let [z] = y in x 


The implementation adds various built-in semirings, some syntactic sugar, and 
extras such as: a singleton unit type, extensions of the theory to semirings with 
a pre-ordering (discussed further in Section 6), and some implicit resolution. 
Anywhere a grade is expected, an underscore can be supplied to indicate that 
Gerty should try to resolve the grade implicitly. Grades may also be omit- 
ted from binders (see above in fst), in which case they are treated as implicits. 
Currently, implicits are handled by generating existentially quantified grade vari- 
ables, and using SMT to solve the necessary constraints (see Section 5). 

So far we have considered the natural numbers semiring providing an analy- 
sis of usage. We come back to this and similar examples in Section 3. To show 
another kind of example, we consider a lattice semiring of privacy levels (appear- 
ing elsewhere [3,23,46]) which enforces information-flow control, akin to DCC [1]. 
Differently to DCC, dataflow is tracked through variable dependencies, rather 
than through the results of computations in the monadic style of DCC. 


Definition 1. [Security levels] Let R = Lo < Hi be a set of labels with 0 = Hi 
and 1 = Lo, semiring addition as the meet and multiplication as join. Here, 1 = Lo 
treats the base notion of dataflow as being in the low security (public) domain. 
Variables graded with Hi must then be unused, or guarded by a graded modality. 
This semiring is primitive in Gerty; we can express the following example: 


idLo : (a: (.0, .2) Type 0) -> (x : (Lo, Hi) a) -> a 
idLo = \a -> \x -> x 

-- The following is rejected as ill-typed 

leak : (a : (.0, .2) Type 0) -> (x : (Hi, Hi) a) -> a 
leak = \a -> \x -> idLo a x 


The first definition is well-typed, but the second yields a typing error originating 
from the application in its body: 


At subject stage got the following mismatched grades: 
For ’x’ expected Hi but got .1 


where grade 1 is Lo here. Thus we can use this abstract label semiring as a way 
of restricting flow of data between regions (cf. region typing systems [31,55]). 
Note that the ordering is not leveraged here other than in the lattice operations. 


3 Case Studies 


We now demonstrate GRTT via several cases studies that focus the reasoning 
power of dependent types via grading. Since grading in GRTT serves to explain 
dataflow, we can characterise subsets of GRTT that correspond to various type 
theories. We demonstrate the approach with simple types, parametric polymor- 
phism, and linearity. In each case study, we restrict GRTT to a subset by a 
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characterisation of the grades, rather than by, say, placing detailed syntactic re- 
strictions or employing meta-level operations or predicates that restrict syntax 
(as one might do for example to map a subset of Martin-Lof type theory into the 
simply-typed A-calculus by restriction to closed types, requiring deep inspection 
of type terms). Since this restriction is only on grades, we can harness the specific 
reasoning power of particular calculi from within the language itself, simply by 
specifications on grades. In the context of an implementation like Gerty, this 
amounts to using type signatures to restrict dataflow. 

This section shows the power of tracking dataflow in types via grades, going 
beyond QTT [6] and GRAD [13]. For ‘quantitative’ semirings, a 0 type-grade 
means that we can recover simply-typed reasoning (Section 3.3) and distinguish 
computational functions from type-parameter functions for parametric reasoning 
(Section 3.4), embedding a grade-restricted subset of GRIT into System F. 

Section 5 returns to a case study that builds on the implementation. 


3.1 Recovering Martin-L6f Type Theory 


When the semiring parameterising GRTT is the singleton semiring (i.e., any 
semiring where 1 = 0), we have an isomorphism O, A & A, and grade annotations 
become redundant, as all grades are equal. All vectors and grades on binders may 
then be omitted, and we can write typing judgments as I F t: A, giving rise to 
a standard Martin-Lof type theory as a special case of GRTT. 


3.2 Determining Usage via Quantitative Semirings 


Unlike existing systems, we can use the fine-grained grading to guarantee the 
relevance or irrelevance of assumptions in types. To do this we must consider a 
subset of semirings (R,*,1,+,0) called quantitative semirings, satisfying: 


(zero-unique) 1 Æ 0; 
(positivity) Vr,s.r+s=0 r=O0As=0; 
(zero-product) Vr,s. r*s =0 r=0Vs=0. 


These axioms‘ ensure that a 0-grade in a quantitative semiring represents irrel- 


evant variable use. This notion has recently been proved for computational use 
by Choudhury et al. [13] via a heap-based semantics for grading (on computa- 
tions) and the same result applies here. Conversely, in a quantitative semiring 
any grade other than 0 denotes relevance. From this, we can directly encode 
non-dependent tensors and arrows: in (x :9 A) @ B the grade 0 captures that x 
cannot have any computational content in B, and likewise for (a :/..9) A) > B 
the grade 0 explains that x cannot have any computational content in B, but 
may have computational use according to s in the inhabiting function. Thus, 


* Atkey requires positivity and zero-product for all semirings parameterising QTT [6] 
(as does Abel [2]). Atkey imposes this for admissibility of substitution. We need not 
place this restriction on GRTT to have substitution in general (Sec. 4.1). 
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the grade 0 here describes that elimination forms cannot ever inspect the vari- 
able during normalisation. Additionally, quantitative semirings can be used for 
encoding simply-typed and polymorphic reasoning. 


Example 1. Some quantitative semirings are: 

— (Exact usage) (N, x,1,+,0); 

— (0-1) The semiring over R = {0,1} with 1+ 1 = 1 which describes relevant 
vs. irrelevant dependencies, but no further information. 

— (None-One-Tons [43]) The semiring on R = {0,1,00} is more fine-grained 
than 0-1, where co represents more than 1 usage, with 1 + 1 = œ = 1 + œ. 


3.3 Simply-typed Reasoning 


As discussed in Section 1, the simply-typed \-calculus (STLC) can be distin- 
guished from dependently-typed calculi via the restriction of dataflow: in simple 
types, data can only flow at the computational level, with no dataflow within, 
into, or from types. We can thus view a GRTT function as simply typed when its 
variable is irrelevant in the type, e.g., (£ :(s,0) A) + B for quantitative semirings. 
We define a subset of GRTT restricted to simply-typed reasoning: 


Definition 2. [Simply-typed GRTT] For a quantitative semiring, the following 
predicate STLC(—) determines a subset of simply-typed GRIT programs: 
STLC((O|0|0) OO t: A) 

STLC((A | a, | o2) OF Ft: A) => STLC((A,0 | 01,5 | 02,0) Ol, a4: BEt: A) 
That is, all subject-type grades are 0 (thus function types are of the form 


(£ :(s,9) A) + B). A similar predicate is defined on well-formed contexts (elided), 
restricting context grades of well-formed contexts to only zero grading vectors. 

Under the restriction of Definition 2, a subset of GRTT terms embeds into 
the simply-typed A-calculus in a sound and complete way. Since STLC does not 
have a notion of tensor or modality, this is omitted from the encoding: 


[l] =se Pvt] = Ar] [h t] = [alle] [e o A) > Bls > [4 I- 


Variable contexts of GRTT are interpreted by point-wise applying [—], to typing 
assumptions. We then get the following preservation of typing into the simply- 
typed A-calculus, and soundness and completeness of this encoding: 


Lemma 3 (Soundness of typing). Given a derivation of (A | o1 | o2) OT F 
t: A such that STLC((A | o1 | 02) OL F t: A) then [T] A [t] : [A], in STLC. 
Theorem 1 (Soundness and completeness of the embedding). Given 
STLC((A | oi | o2) OF Ft: A) and [(A| cı |2) OF Et: Al then for CBN 
reduction ~°™° in simply-typed r-calculus: 
(soundness) Vt’. ift~» t then [t] =s [t] 
(completeness) Yta. if [t] 5" ta then St’. t~ t A [E] Spr ta 


Thus, we capture simply-typed reasoning just by restricting type grades to 0 for 
quantitative semirings. We consider quantitative semirings again for parametric 
reasoning, but first recall issues with parametricity and dependent types. 
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3.4 Recovering Parametricity via Grading 


One powerful feature of grading in a dependent type setting is the ability to 
recover parametricity from dependent function types. Consider the following 
type of functions in System F (we borrow this example from Nuyts et al. [45]): 


RIA B ê W. 3 A) > (7> B) 


Due to parametricity, we get the following notion of representation independence 
in System F: for a function f : RI A B, some type 7’, and terms h : 7 > A 
and c: 7, then we know that f can only use c by applying h c. Subsequently, 
RI A B S A > B by parametricity [52], defined uniquely as: 


iso: RI A B > (A > B) iso™} : (A => B)—>RI A B 
iso f = f A (id A) iso.’ g = Ay. Ah. A(c: 7). g(he) 


In a dependently-typed language, one might seek to replace System F’s universal 
quantifier with [T-types, i.e. 


R! AB £ (q: Type) > (y > A) > (y > B) 


However, we can no longer reason parametrically about the inhabitants of such 
types (we cannot prove that RI’ A B = A —> B) as the free interaction of types 
and computational terms allows us to give the following non-parametric element 
of RI’ A B over ‘large’ type instances: 


leak = Ny. Ah. Ac. y : RV A Type 


Instead of applying h c, the above “leaks” the type parameter y. GRTT can re- 
cover universal quantification, and hence parametric reasoning, by using grading 
to restrict the data-flow capabilities of a I-type. We can refine representation 
independence to the following: 


RI” AB Ê (¥:¢,2) Type) — (h :(s,,0) (© %(59,0) Y) > A) > (€ (63,0) Y) > B 


for some grades sı, S2, and s3, and with shorthand 2 = 1 + 1. 

If we look at the definition of leak above, we see that y is used in the body 
of the function and thus requires usage 1, so leak cannot inhabit RI” A Type. 
Instead, leak would be typed differently as: 


leak : (Y :(1,2) Type) > (R :({0,0) (£ *(s,0) Y) > A) > (€ :{0,0) 7) > Type 


The problematic behaviour (that the type parameter y is returned by the inner 
function) is exposed by the subject grade 1 on the binder of y. We can thus 
define a graded universal quantification from a graded H-typed: 


V-(7: A).B (7:0) 4) > B (1) 


This denotes that the type parameter y can appear freely in B described by 
grade r, but is irrelevant in the body of any corresponding A-abstraction. This is 
akin to the work of Nuyts et al. who develop a system with several modalities for 
regaining parametricity within a dependent type theory [45]. Note however that 
parametricity is recovered for us here as one of many possible options coming 
from systematically specialising the grading. 
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Capturing Existential Types With the ability to capture universal quantifier, we 
can similarly define existentials (allowing, e.g., abstraction [11]). We define the 
existential type via a Church-encoding as follows: 


r(x : A).B = W2(C : Type)).(f 21,0) Vr(@ : A).(b s0) B) 4 C) + C 


Embedding into Stratified System F We show that parametricity is regained here 
(and thus eqn. (1) really behaves as a universal quantifier and not a general H- 
type) by showing that we can embed a subset of GRTT into System F, based 
solely on a classification of the grades. We follow a similar approach to Section 3.3 
for simply-typed reasoning but rather than defining a purely syntactic encoding 
(and then proving it type sound) our encoding is type directed since we embed 
GRTT functions of type (x :(o,r) Type) + B as universal types in System F with 
corresponding type abstractions (A) as their inhabitants. Since GRTT employs 
a predicative hierarchy of universes, we target Stratified System F (hereafter 
SSF) since it includes the analogous inductive hierarchy of kinds [38]. We use 
the formulation of Eades and Stump [21] with terms t, and types T: 


t,t=9 | A(x : T).ts | tet, | A(X : K).ts |t [T] T:=X|T>T|Y(X:K)T 


with kinds K ::= xı where l € N providing the stratified kind hierarchy. Cap- 
italised variables X are System F type variables and ts[T] is type applica- 
tion. Contexts may contain both type and computational variables, and so free- 
variable type assumptions may have dependencies, akin to dependent type sys- 
tems. Kinding is via judgments + T : x; and typing via lF t: T. 

We define a type directed encoding on a subset of GRTT typing derivations 
characterised by the following predicate: 


SsF((0|0|0)OOF t: A) 
SsF((A | o1 |02) Or Ft: A) => SsF((A,0 | o1,0| o2,r) OL, x: Type H t: A) 
SsF((A | o1 | o2) OT F t: A) A Type, ¢*”* B 

=> SsF((A, 03 | 01,5 | 02,0) OT,x£x:BFt: A) 


By Type, ¢*’® B we mean Type, is not a positive subterm of B, avoiding higher- 
order typing terms (e.g., type constructors) which do not exist in SSF. 

Under this restriction, we give a type-directed encoding mapping derivations 
of GRTT to SSF: given a GRTT derivation of judgment (A|o1|o2)0 CFF t:A 
we have that Jt, (an SSF term) such that there is a derivation of judgment 
II] F ts : [A], in SSF where we interpret a subset of GRTT terms A as types: 


[z] = z 
[Type;]- =x 
[(x 0) Type) > B]- = Ve : xı.[B]- where Type, g*™”° B 
[(x :(s,.0) A) > B]- = [A]; > [B], where Type, ¢*’° A, B 


Thus, dependent functions with Type parameters that are computationally irrel- 
evant (subject grade 0) map to V types, and dependent functions with param- 
eters irrelevant in types (subject-type grade 0) map to regular function types. 
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We elide the full details but sketch key parts where functions and applications 
are translated inductively (where Ty, is shorthand for Type;): 


i (4,01 | 02,0 | 03,r) OT, x : TyF t: B _ [D], x: x F ts : [B]- 
(A | o2 | 01 +03) OL F Azt : (£ 20,7) Ty) > B I] F A(z :x).ts : Va: xı. B]- 


T], « : [A] F ts : [B]- 
A(z : JA],).ts : [A] > [B]- 


i (4,01 | 02,8 | 03,0) OT,x:AFHt:B j- 
(A| o2 | o1 +03) OT F Att: (£50) A >B [r 


H 
(A | o2 | o1 +03) OL F ti: (£ :{0,r) Ty) > B I] F ts : V(a : x). [B]- 
(lave D |= IE Ts x: 
(A|o2|o3 +r*o4) OTF tite: [t2/x]B T] F ts[T] : [T/xz][B]- 
F 
I 
T 


+ ts: [A]; > [B] 
et : [A]- 
F ts ts : [ts/x] [B T 


(A | o2 |01 +03) OT F ti: (£ 5,0) A) > B 
(ooo 
(A| o2 +s x04 |03) OTF tite: [te/2]B 


]= 


In the last case, note the presence of [t /x]|B]-. Reasoning under the context of 
the encoding, this is proven equivalent to [|B], since the subject type grade is 0 
and therefore use of x in B is irrelevant. 


Theorem 2 (Soundness and completeness of SSF embedding). Given 
SSF((A | o1 | a2) OL Ft: A) and ta in SSF where [(A | o1 | o2) OF Ft: A] = 
[I] F ts : [A], then for CBN reduction ~»5** in Stratified System F: 


(soundness) Yt. t~ t => Atl.t, SSF t 
© Af(Alor lo) ore : A] = ITHE: [Al 
(completeness) Ytl. ts SF t, => Wt ~t 
A [(Alor|o.)OrFt: AJ=LT]+ t : A] 


Thus, we can capture parametricity in GRTT via the judicious use of 0 grading 
(at either the type or computational level) for quantitative semirings. This em- 
bedding is not possible from QTT since QTT variables graded with 0 may be 
used arbitrarily in the types; the embedding here relies on GRTT’s 0 type-grade 
capturing abscence in types for quantitative semirings. 


3.5 Graded Modal Types and Non-dependent Linear Types 


GRTT can embed the reasoning present in other graded modal type theories 
(which often have a linear base), for example the explicit semiring-graded neces- 
sity modality found in coeffect calculi [10,23] and Granule [46]. We can recover 
the axioms of a graded necessity modality (usually modelled by an exponential 
graded comonad [23]). For example, in Gerty the following are well typed: 


counit : (a: (.0, .2) Type) -> (z: (.1 , .0) [.1] a) -> a 

counit = \a z -> case z of [y] -> y 

comult : (a : (.0, .2) Type) -> (z: (.1 , .0) [.6] a) -> [.2] ([.3] a) 
\a z -> case z of [y] -> [[y]] 


comult 


Graded Modal Dependent Type Theory A477 


corresponding to e : OA —> A and 6,, : O,.;A > O,(01,A): operations of 
graded necessity / graded comonads. Since we cannot use arbitrary terms for 
grades in the implementation, we have picked some particular grades here for 
comult. First-class grading is future work, discussed in Section 6. 

Linear functions can be captured as A — B £ (a :( r) A) > B for an exact 
usage semiring. It is straightforward to characterise a subset of GRTT programs 
that maps to the linear A-calculus akin to the encodings above. Thus, GRTT 
provides a suitable basis for studying both linear and non-linear theories alike. 


4 Metatheory 


We now study GRTT’s metatheory. We first explain how substitution presents 
itself in the theory, and how type preservation follows from a relationship between 
equality and reduction. We then show admissibility of graded structural rules 
for contraction, exchange, and weakening, and strong normalization. 


4.1 Substitution 

We introducing substitution for well-formed contexts and then typing. 

Lemma 4 (Substitution for well-formed contexts). If the following hold: 
1. (Aloo|a)OMrt:A and 2 (4,01, 4) ON, x: A, I2 H 

Then: A, (A’\ |A| + (4'/ |A|) * o2) © T1, [t/a]Io + 


That is, given I[,x : A, I> is well-formed, we can cut out x by substituting t for 
x in I, accounting for the new usage in the context grade vectors. The usage of 
I, in t is given by og, and the usage in A by o1. When substituting, A remains 
the same, as I is unchanged. However, to account for the usage in [t/a]I2, we 
have to form a new context grade vector A’\ |A| + (4'/ |A|) * o2. 

The operation A’\ | A] (pronounced ‘discard’) removes grades corresponding 
to z, by removing the grade at index |A| from each grade vector in A’. Every- 
thing previously used in the typing of x in the context must now be distributed 
across [t/x|I%, which is done by adding on (A’/|Al]) * 02, which uses A’/|A| 
(pronounced ‘choose’) to produce a vector of grades, which correspond to the 
grades cut out in A’\ |A|. The multiplication of (A’/|A]) *o2 produces a context 
grade vector by scaling a2 by each element of (A’/|A|). When adding vectors, 
if the sizes of the vectors are different, then the shorter vector is right-padded 
with zeroes. Thus A’\ |A| + (4'/ |A|) * o2 can be read as ‘A’ without the grades 
corresponding to x, plus the usage of t scaled by the prior usage of x’. 

For example, given typing ((),(1) | 0,1 | 1,0) Oa: Type,y: at y: a and 
well-formed context ((), (1), (1,0), (0,0,2)) Oa: Type, y : a,x : a,z : t’ H, where 
t’ uses x twice, we can substitute y for x. Therefore, let T; = a: Type, y : a thus 
|r| = 2 and I, = z : z and A’ = ((0,0,2)) and cı = 1,0 and og = 0,1. Then 
the context grade of the substitution [y/a]|I is calculated as: 


((0, 0, 2))\ |i] = ((0, 0)) (((0, 1, 2))/ |Li|) * o2 = (2) * (0, 1) = ((0, 2) 
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Thus the resulting judgment is ((), (1), (0,2)) Oa: Type, y : a, z : [y/az]t’ F. 
Lemma 5 (Substitution for typing). If the following premises hold: 


1. (Alog|a)OMEt:A 

2. (A,o1, A’ | 03, 8,04 | 05,7,06) O Tix: A, Ig: B 

3. |o3| = |os| = |F| 

A(A'\|A|+(A‘/|A])*o2) 
Then ( (o3+8*02),04 ) OM, [t/z] F [t/a]t’ : [t/a] B. 
(o5+1r*02),06 

As with substitution for well-formed contexts, we account for the replacement of 
x with t in I> by ‘cutting out’ x from the context grade vectors, and adding on 
the grades required to form t, scaled by the grades that described x’s usage. We 
additionally must account for the altered subject and subject-type usage. We do 
this in a similar manner, by taking, for example, the usage of I in the subject 
(o3), and adding on the grades required to form t, scaled by the grade with 
which x was previously used (s). Subject-type grades are calculated similarly. 


4.2 Type Preservation 


Lemma 6. Reduction implies equality If (A | o1 | o2) OT F tı : A and tı ~> te, 
then (A| oi | o2) OLF ti =t2: A. 


Lemma 7. Equality inversion If (A | o1 | o2) OT F ti = t2 : A, then (A | c | 
o2)OLF tH: A and (A| o|o) OTF tg: A. 


Lemma 8. Type preservation If (A | o, | o2) OL Ft: A andt ~ t, then 
(Alaloor Fr: A. 


Proof. By Lemma 6 we have (A | o1 | o2) © T F t = t : A, and therefore by 
Lemma 7 we have (A | o1 | o2) © T Ft’: A, as required. 


4.3 Structural Rules 
We now consider the structural rules of contraction, exchange, and weakening. 
Lemma 9 (Contraction). The following rule is admissible: 


A1,01,(01,0),A 
( ere l Ors: Ay: A, D Ft:B |4]= lol = lea] = |T 
04,71,72,75 
Aj,01,contr(|A;|;A2) 
( 72,(81+82),08 ) Ol,2:A,[z,z/x,y]I2 F [z,z/x,ylt : [z, 2/2, y]B 


o4,(rit+re2),o5 


CONTR 


The operation contr(z; A) contracts the elements at index m and 7 + 1 for each 
vector in A by combining them with the semiring addition, defined contr(z; A) = 
A\(m+1)+ A/(7+1) * (07,1). Admissibility follows from the semiring addition, 
which serves to contract dependencies, being threaded throughout the rules. 
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Lemma 10 (Exchange). The following rule is admissible: 


x € FV(B) 
Ai,01,(02,0),A 
Al =los| = los) =i) (Mahe?) oN z: Ay: Bb t:C 
05,T1,T2,06 


Exc 


(0ean * ) OTi,y:B,x: A, Io2Ft:C 
05,72,T1,96 
Notice that if you strip away the vector fragment and sizing premise, this is 
exactly the form of exchange we would expect in a dependent type theory: if 
x and y are assumptions in a context typing t : C, and the type of y does not 
depend upon zx, then we can type t : C when we swap the order of x and y. 
The action on grade vectors is simple: we swap the grades associated with 
each of the variables. For the context grade vector however, we must do two 
things: first, we capture the formation of A with o,, and the formation of B 
with 01,0 (indicating x being used with grade 0 in B), then swap these around, 
cutting the final grade from o2,0, and adding 0 to the end of oj, to ensure 
correct sizing. Next, the operation exch(|A;|; ^2) swaps the element at index 
|41| i.e., that corresponding to usage of x) with the element at index |A| + 1 
(corresponding to y) for every vector in Ag; this exchange operation ensures that 
usage in the trailing context is reordered appropriately. 


Lemma 11 (Weakening). The following rule is admissible: 
(Aj, Ae | 01,0) | o2,0,)0M, +t: B 


(A; | o3 | O) OT, F A: Type, |o1| = |oo| = |T| 


WEAK 
Aj, 03, ins Ay ; 0; Ao c1, 0,0; 02,0, 0% Ol,,c7:A,Ig-t:B 
1 2 


Weakening introduces irrelevant assumptions to a context. We do this by captur- 
ing the usage in the formation of the assumption’s type with g3 to preserve the 
well-formedness of the context. We then indicate irrelevance of the assumption 
by grading with 0 in appropriate places. The operation ins(z;s; A) inserts the 
element s at index 7 for each o in A, such that all elements preceding index 7 
(in a) keep their positions, and every element at index m or greater (in o) will 
be shifted one index later in the new vector. The 0 grades in the subject and 
subject-type grade vector positions correspond to the absence of the irrelevant 
assumption from the subject and subject’s type. 


4.4 Strong Normalization 


We adapt Geuvers’ strong normalization proof for the Calculus of Constructions 
(CC) [24] to a fragment of Grrv (called Grrrt?:}) restricted to two universe 
levels and without variables of type Type. This results in a less expressive system 
than full GRTT when it comes to higher kinds, but this is orthogonal to the main 
idea here of grading. We briefly overview the strong normalization proof; details 
can be found in the extended version [44]. Note this strong normalization result 
is with respect to 8-reduction only (our semantics does not include 7-reduction). 
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We use the proof technique of saturated sets, based on the reducibility candi- 
dates of Girard [29]. While GRTTÍ®®} has a collapsed syntax we use judgments 
to break typing up into stages. We use these sets to match on whether a term is 
a kind, type, constructor, or a function (we will refer to these as terms). 


Definition 3. Typing can be broken up into the following stages: 

Kind := {A | 34,01, T.(A | cı |0) OT F A: Type, } 

Type := {A | 34, c1, T. (A| cı |0) OT F A: Typeg} 

Con := {t|5A,o1,02,P,A(A|oi|o2)OL Ft: AA(A|o2|0)OLEA: Type} 
Term := {t | 34, 01,02, T, A. (A| oi |o) OrFt:AA^A(A|o2|0) OTF A: Type} 


Lemma 12 (Classification). We have Kind N Type = Ý and Con N Term = 9). 


The classification lemma states that we can safely case split over kinds and types, 
or constructors and terms without fear of an overlap occurring. 

Saturated sets are essentially collections of strongly normalizing terms that 
are closed under 8-reduction. The intuition behind this proof is that every ty- 
pable program ends up in some saturated set, and hence, is strongly normalizing. 


Definition 4. [Base terms and saturated terms] Informally, the set of base 
terms B is inductively defined from variables and Type, and Type,, and com- 
pound terms over base 6 and strongly normalising terms SN. 

A set of terms X is saturated if X C SN, B C X, and if red,t € X and 
t € SN, then t € X. Thus saturated sets are closed under strongly normalizing 
terms with a key redex, denoted red, t, which are redexes or a redex at the head 
of an elimination form. SAT denotes the collection of saturated sets. 


Lemma 13 (SN saturated). All saturated sets are non-empty; SN is saturated. 


Since Grrrt?:} allows computation in types as well as in types, we separate the 
interpretations for kinds and types, where the former is a set of the latter. 


Definition 5. For A € Kind, the kind interpretation, KA], is defined: 
K[Type] =SAT K(x xs) A) > B] = {f | f : KJA] > K[B]}, if A,B € Kind 


KOA] =K[A] Ka 5, A) > B] = KIA], if A € Kind, B € Type 
K[(@ (s.r) A) > B] = KB], if A € Type, B € Kind 
Kl (x: A)@B]  =K[A] x KIB], if A,B € Kind 
K[ (a :- A) 8 B] = KA], if A € Kind, B € Type 
K[ (a : A) 8 B] = K[B], if A € Type, B € Kind 


Next we define the interpretation of types, which requires the interpretation to be 
parametric on an interpretation of type variables called a type evaluation. This is 
necessary to make the interpretation well-founded (first realized by Girard [29]). 


Definition 6. Type valuations, A © I = e€, are defined as follows: 
X € KIA] AOT Ee AOT Ee 
(A|a|0)OLE A: Type, (A|a|0)OLE A: Type, 
Y 
poH (4,0) © (a: A) E elx > X] (A,o) © (T,x: A) =e 
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Type valuations ignore term variables (rule TM), in fact, the interpretations 
of both types and kinds ignores them because we are defining sets of terms 
over types, and thus terms in types do not contribute to the definition of these 
sets. However as these interpretations define sets of open terms we must carry a 
graded context around where necessary. Thus, type valuations are with respect 
to a well-formed graded context A© I’. We now outline the type interpretation. 


Definition 7. For type valuation AOT |= € and a type A € (KindU TypeUCon) 
with A typable in AOI, the interpretation of types [A] is defined inductively. 
For brevity, we list just a few illustrative cases, including modalities and some 
function cases; the complete definition is given in the extended version [44]. 


[Type,]- = SN 
[Type]: = AX € SAT.SN 
[z]- = € x if x € Con 
[0A]: = [A]: 
[Az : A.B]e = AX € KJAJ- [B]e x] if A € Kind, B € Con 
A B]e = [A]-((2]-) if B € Con 
[e on) A) > B]e = AX € KIA] > KIB]-Nyexya (Ale Y > [Bletosyy (X (Y)) 


if A,B € Kind 


Grades play no role in the reduction relation for GRTT, and hence, our inter- 
pretation erases graded modalities and their introductory and elimination forms 
(translated into substitutions). In fact, the above interpretation can be seen as a 
translation of GRTTÍ®Ù into non-substructural set theory; there is no data-usage 
tracking in the image of the interpretation. Tensors are translated into Cartesian 
products whose eliminators are translated into substitutions similarly to graded 
modalities. All terms however remain well-typed through the interpretation. 

The interpretation of terms corresponds to term valuations that are used to 
close the term before interpreting it into the interpretation of its type. 


Definition 8. Valid term valuations, A © T |+- p, are defined as follows: 


t € ([A]e) (ex) t € [A]. 
AOT H p AOT H p 
(A|a|0)OLE A: Type, (A|o|0)OLFEA: Type 


000 S (A,o)OT,x : A He pfx => t] 7 (Ajo) OL, @: A He plx > t] 


We interpret terms as substitutions, but graded modalities must be erased and 
their elimination forms converted into substitutions (and similarly for the elim- 
inator for tensor products). 


Definition 9. Suppose A © I |e p. Then the interpretation of a term t ty- 
pable in A©T is (t), = pt, but where all let-expressions are translated into 
substitutions, and all graded modalities are erased. 


Finally, we prove our main result using semantic typing which will imply strong 
normalization. Suppose (A | o1 | o2) OFF t: A, then: 


482 B. Moon et al. 


Definition 10. Semantic typing, (A | o1 | c2) OT FE t : A, is defined as follows: 


1. If (A|o|0)OLFA: Type, then for every A OT e p, (t)o € [Ale ([t]e). 
2. If (A|o|0)©TF A: Typeg, then for every AOT Fe p, (t) € (Ale. 


Theorem 3 (Soundness for Semantic Typing). (A | o1 | o2) Or Et: A. 


Corollary 1 (Strong Normalization). We have t € SN. 


5 Implementation 


Our implementation Gerty is based on a bidirectionalised version of the typing 
rules here, somewhat following traditional schemes of bidirectional typing [19,20] 
but with grading (similar to Granule [46] but adapted considerably for the de- 
pendent setting). We briefly outline the implementation scheme and highlight 
a few key points, rules, and examples. We use this implementation to explore 
further applications of GRTT, namely optimising type checking algorithms. 
Bidirectional typing splits declarative typing rules into check and infer modes. 
Furthermore, bidirectional GRTT rules split the grading context (left of ©) into 
input and output contexts where (A | o1 | o2) © T F t: A is implemented via: 


(check) A; F t < A;oi;o2 or (infer) A;r F t= A;oi;02 


where < rules check that t has type A and = rules infer (calculate) that t 
has type A. In both judgments, the context grading A and context I left of 
H are inputs whereas the grade vectors cı and o2 to the right of A are out- 
puts. This input-output context approach resembles that employed in linear 
type checking [5,32,62]. Rather than following a “left over” scheme as in these 
works (where the output context explains what resources are left), the output 
grades here explain what has been used according to the analysis of grading 
(‘adding up’ rather than ‘taking away’). 
For example, the following is the infer rule for function elimination: 

A; T F ti > (2 {s r) A) > B; 02; 013 

A; I F tg = A; 04; 01 

A, cı; I, x: At B => Type; o3,7;0 013 = 01 + 03 

A; T F titz > |[t2/£]B; 02 + $ * 04; 03 +7 * 04 


> Ae 


The rule can be read by starting at the input of the conclusion (left of +), then 
reading top down through each premise, to calculate the output grades in the 
rule’s conclusion. Any concrete value or already-bound variable appearing in the 
output grades of a premise can be read as causing an equality check in the type 
checker. The last premise checks that the output subject-type grade c13 from 
the first premise matches cı + g3 (which were calculated by later premises). 

In contrast, function introduction is a check rule: 


A; T A= Type; o1; 0 A, g1; I, x: AF t <= B; 02,8; 03,r 
A; T F ret = (z 45,7) A) > B; 02;01 +03 
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Thus, dependent functions can be checked against type (2 +.) A) + B given 
input A; TI by first inferring the type of A and checking that its output subject- 
type grade comprises all zeros 0. Then the body of the function t is checked 
against B under the context A,o,;I,x : A producing grade vectors o2,s’ and 
a1, r” where it is checked that s = s’ and r = r’ (described implicitly in the rule), 
i.e., the calculated grades match those of the binder. 

The implementation anticipates some further work for GRTT: the potential 
for grades which are first-class terms, for which we anticipate complex equations 
on grades. For grade equality, Gerty has two modes: one which normalises 
terms and then compares for syntactic equality, and the other which discharges 
constraints via an off-the-shelf SMT solver (we use Z3 [17]). We discuss briefly 
some performance implications in the next section. 


Using Grades to Optimise Type Checking Abel posited that a dependent theory 
with quantitative resource tracking at the type level could leverage linearity- 
like optimisations in type checking [2]. Our implementation provides a research 
vehicle for exploring this idea; we consider one possible optimisation here. 

Key to dependent type checking is the substitution of terms into types in 
elimination forms (i.e., application, tensor elimination). However, in a quanti- 
tative semiring setting, if a variable has 0 subject-type grade, then we know it 
is irrelevant to type formation (it is not semantically depended upon, i.e., dur- 
ing normalisation). Subsequently, substitutions into a 0-graded variable can be 
elided (or allocations to a closure environment can be avoided). We implemented 
this optimisation in Gerty when inferring the type of an application for tı t2 
(rule = Ae above), where the type of tı is inferred as (a :(so) A) > B. For a 
quantitative semiring we know that x irrelevant in B, thus we need not perform 
the substitution [t2/x] B when type checking the application. 

We evaluate this on simple Gerty programs of an n-ary “fanout” combinator 
implemented via an n-ary application combinator, e.g., for arity 3: 


app3 : (a: (0, 6) Type 0) -> (b : (0, 2) Type 0) 

-> (x0 : (1, 0) a) -> (x1 : (1, 0) a) -> (x2 : (1, 0) a) 

-> (£:(1, 0) (Cy0:(1,0) a) -> (y1:(1,0) a) -> (y2:(1,0) a) -> b)) -> b 
app3 = \a -> \b -> \x0 -> \x1 -> \x2 -> \f -> f xO x1 x2 


fan3 : (a : (0, 4) Type 0) -> (b : (0, 2) Type 0) 

-> (f : (1,0) ((z0 : (1,0) a) -> (z1 : (1,0) a) -> (z2 : (1,0) a) -> b)) 
-> (x : (3, 0) a) -> b 

fan3 = \a -> \b -> \f -> \x -> appBabxxxf 


Note that fan3 uses its parameter x three times (hence the grade 3) which then 
incurs substitutions into the type of app3 during type checking, but each such 
substitution is redundant since the type does not depend on these parameters, 
as reflected by the 0 subject-type grades. 

To evaluate the optimisation and SMT solving vs. normalisation-based equal- 
ity, we ran Gerty on the fan out program for arities from 3 to 8, with and without 
the optimisation and under the two equality approaches. 
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Normalisation SMT 
n Base ms Optimised ms Speedup Base ms Optimised ms Speedup 
3]| 45.71 (1.72) 44.08 (1.28) 1.04 77.12 (2.65) 76.91 (2.36) 1.00 
4|| 108.75 (4.09) 89.73 (4.73) 1.21 136.18 (5.23) 162.95 (3.62) 0.84 
5]| 190.57 (8.31) 191.25 (8.13) 1.00 279.49 (15.73) 289.73 (23.30) 0.96 
6]| 552.11 (29.00) 445.26 (23.50) 1.24 680.11 (16.28) 557.08 (13.87) 1.22 
7|/1821.49 (49.44) 1348.85 (26.37) 1.35 |1797.09 (43.53) 1368.45 (20.16) 1.31 
8]/6059.30 (132.01) 4403.10 (86.57) 1.38 [5913.06 (118.83) 4396.90 (59.82) 1.34 


Table 1. Performance analysis of grade-based optimisations to type checking. Times 
in milliseconds to 2 d.p. with the standard error given in brackets. Measurements are 
the mean of 10 trials (run on a 2.7 Ghz Intel Core, 8Gb of RAM, Z3 4.8.8). 


Table 1 gives the results. For grade equality by normalisation, the optimisation 
has a positive effect on speedup, getting increasingly significant (up to 38%) 
as the overall cost increases. For SMT-based grade equality, the optimisation 
causes some slow down for arity 4 and 5 (and just breaking even for arity 3). 
This is because working out whether the optimisation can be applied requires 
checking whether grades are equal to 0, which incurs extra SMT solver calls. 
Eventually, this cost is outweighed by the time saved by reducing substitutions. 
Since the grades here are all relatively simple, it is usually more efficient for the 
type checker to normalise and compare terms rather than compiling to SMT and 
starting up the external solver, as seen by longer times for the SMT approach. 

The baseline performance here is poor (the implementation is not highly opti- 
mised) partly due to the overhead of computing type formation judgments often 
to accurately account for grading. However, such checks are often recomputed 
and could be optimised away by memoisation. Nevertheless this experiment gives 
the evidence that grades can indeed be used to optimise type checking. A thor- 
ough investigation of grade-directed optimisations is future work. 


6 Discussion 


Grading, Coeffects, and Quantitative Types The notion of coeffects, describing 
how a program depends on its context, arose in the literature from two directions: 
as a dualisation of effect types [48,49] and a generalisation of Bounded Linear 
Logic to general resource semirings [25,10]. Coeffect systems can capture reuse 
bounds, information flow security [23], hardware scheduling constraints [25], and 
sensitivity for differential privacy [16,22]. A coeffect-style approach also enables 
linear types to be retrofitted to Haskell [8]. A common thread is the annotation 
of variables in the context with usage information, drawn from a semiring. Our 
approach generalises this idea to capture type, context, and computational usage. 

McBride [43] reconciles linear and dependent types, allowing types to depend 
on linear values, refined by Atkey [6] as Quantitative Type Theory. QTT employs 
coeffect-style annotation of each assumption in a context with an element of a 
resource accounting algebra, with judgments of the form: 


ri Z Ay,...,2n" An EM *B 
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where p;, p are elements of a semiring, and p = 0 or p = 1, respectively denoting 
a term which can be used in type formation (erased at runtime) or at runtime. 


Dependent function arrows are of the form (x È A) > B, where p is a semiring 
element that denotes the computational usage of the parameter. 

Variables used for type formation but not computation are annotated by 0. 
Subsequently, type formation rules are all of the form OI F T, meaning every 
variable assumption has a 0 annotation. GRTT is similar to QTT, but differs in 
its more extensive grading to track usage in types, rather than blanketing all 
type usage with 0. In Atkey’s formulation, a term can be promoted to a type if 
its result and dependency quantities are all 0. A set of rules provide formation of 
computational type terms, but these are also graded at 0. Subsequently, it is not 
possible to construct an inhabitant of Type that can be used at runtime. We avoid 
this shortcoming allowing matching on types. For example, a computation t that 
inspects a type variable a would be typed as: (A, 0, A’ | o1, 1,01 | o2, r, 04 )OT, a : 
Type, I” + t: B denoting 1 computational use and r type uses in B. 

At first glance, it seems QTT could be encoded into GRTT taking the semiring 
R of QTT and parameterising GRTT by the semiring R U {0} where 0 denotes 
arbitrary usage in type formation. However, there is impedance between the two 
systems as QTT always annotates type use with 0. It is not clear how to make 
this happen in GRTT whilst still having non-0 tracking at the computational 
level, since we use one semiring for both. Exploring an encoding is future work. 

Choudhury et al. [13] give a system closely related (but arguably simpler) to 
QTT called GRAD. One key difference is that rather than annotating type usage 
with 0, grades are simply ignored in types. This makes for a surprisingly flexible 
system. In addition, they show that irrelevance is captured by the 0 grade using 
a heap-based semantics (a result leveraged in Section 3). GRAD however does 
not have the power of type-grades presented here. 


Dependent Types and Modalities Dal Lago and Gaboardi extend PCF with lin- 
ear and lightweight dependent types [15] (then adapted for differential privacy 
analysis [22]). They add a natural number type indexed by upper and lower 
bound terms which index a modality. Combined with linear arrows of the form 
[a < I].o — rT these describe functions using the parameter at most J times 
(where the modality acts as a binder for index variable a which denotes in- 
stantiations). Their system is leveraged to give fine-grained cost analyses in the 
context of Implicit Computational Complexity. Whilst a powerful system, their 
approach is restricted in terms of dependency, where only a specialised type can 
depend on specialised natural-number indexed terms (which are non-linear). 

Gratzer et al. define a dependently-typed language with a Fitch-style modal- 
ity [30]. It seems that such an approach could also be generalised to a graded 
modality, although we have used the natural-deduction style for our graded 
modality rather than the Fitch-style. 

As discussed in Section 1, our approach closely resembles Abel’s resource- 
ful dependent types [2]. Our work expands on the idea, including tensors and 
the graded modalities. We considerably developed the associated metatheory, 
provide an implementation, and study applications. 
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Further Work One expressive extension is to capture analyses which have an 
ordering, e.g., grading by a pre-ordered semiring, allowing a notion of approxi- 
mation. This would enable analyses such as bounded reuse from Bounded Linear 
Logic [28], intervals with least- and upper-bounds on use [46], and top-completed 
semirings, with an oo-element denoting arbitrary usage as a fall-back. We have 
made progress into exploring the interaction between approximation and depen- 
dent types, and the remainder of this is left as future work. 

A powerful extension of GRTT for future work is to allow grades to be first- 
class terms. Typing rules in GRTT involving grades could be adapted to in- 
ternalise the elements as first-class terms. We could then, e.g., define the map 
function over sized vectors, which requires that the parameter function is used 
exactly the same number of times as the length of the vector: 


map : (n (0,5) nat) > (a :(o,n41) Type) > (b :(0,n+1) Type) > 
(f %(m,0) (£ 10,0) a) > b) > (a8 :(1,0) Vecna) + Vecn b 


This type provides strong guarantees: the only well-typed implementations do 
the correct thing, up to permutations of the result vector. Without the grading, 
an implementation could apply f fewer than n times, replicating some of the 
transformed elements; here we know that f must be applied exactly n-times. 

A further appealing possibility for GRTT is to allow the semiring to be defined 
internally, rather than as a meta-level parameter, leveraging dependent types for 
proofs of key properties. An implementation could specify what is required for a 
semiring instance, e.g., a record type capturing the operations and properties of a 
semiring. The rules of GRTT could then be extended, similarly to the extension 
to first-class grades, with the provision of the semiring(s) coming from GRTT 
terms. Thus, anywhere with a grading premise (A | o1 | o2) OT F r:R would 
also require a premise (A | o2 |0) © LI FR : Semiring. This opens up the ability 
for programmers and library developers to provide custom modes of resource 
tracking with their libraries, allowing domain-specific program verification. 


Conclusions The paradigm of ‘grading’ exposes the inherent structure of a type 
theory, proof theory, or semantics by matching the underlying structure with 
some algebraic structure augmenting the types. This idea has been employed for 
reasoning about side effects via graded monads [35], and reasoning about data 
flow as discussed here by semiring grading. Richer algebras could be employed 
to capture other aspects, such as ordered logics in which the exchange rule can 
be controlled via grading (existing work has done this via modalities [34]). 

We developed the core of grading in the context of dependent-types, treating 
types and terms equally (as one comes to expect in dependent-type theories). 
The tracking of data flow in types appears complex since we must account for 
how variables are used to form types in both the context and in the subject 
type, making sure not to repeat context formation use. The result however is 
a powerful system for studying dependencies in type theories, as shown by our 
ability to study different theories just be specialising grades. Whilst not yet a 
fully fledged implementation, Gerty is a useful test bed for further exploration. 
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Abstract. The termination behavior of probabilistic programs depends on the 
outcomes of random assignments. Almost sure termination (AST) is concerned 
with the question whether a program terminates with probability one on all possible 
inputs. Positive almost sure termination (PAST) focuses on termination in a finite 
expected number of steps. This paper presents a fully automated approach to the 
termination analysis of probabilistic while-programs whose guards and expressions 
are polynomial expressions. As proving (positive) AST is undecidable in general, 
existing proof rules typically provide sufficient conditions. These conditions mostly 
involve constraints on supermartingales. We consider four proof rules from the 
literature and extend these with generalizations of existing proof rules for (P)AST. 
We automate the resulting set of proof rules by effectively computing asymptotic 
bounds on polynomials over the program variables. These bounds are used to decide 
the sufficient conditions — including the constraints on supermartingales — of a 
proof rule. Our software tool AMBER can thus check AST, PAST, as well as their 
negations for a large class of polynomial probabilistic programs, while carrying out 
the termination reasoning fully with polynomial witnesses. Experimental results 
show the merits of our generalized proof rules and demonstrate that AMBER can 
handle probabilistic programs that are out of reach for other state-of-the-art tools. 


Keywords: Probabilistic Programming - Almost sure Termination - Martingales 
- Asymptotic Bounds - Linear Recurrences 


1 Introduction 


Classical program termination. Termination is a key property in program analysis [16]. 
The question whether a program terminates on all possible inputs — the universal halting 
problem — is undecidable. Proof rules based on ranking functions have been developed that 
impose sufficient conditions implying (non-)termination. Automated termination check- 
ing has given rise to powerful software tools such as AProVE [21] and NaTT [44] (using 
term rewriting), and UltimateAutomizer [26] (using automata theory). These tools have 
shown to be able to determine the termination of several intricate programs. The industrial 
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x:=10 x-=10 x=0,y=0 r x:=10,y:=0 
while z5 0ido while x > 0 do while x +y* < 100 do while x > 0 do 
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| xSx+l[/]x-1 | xcx—1jxta | *5x+ [2x1 AAN TEE 
end end y=y+s[t/2]y-e aa a =y 
end end 
a b 
(a) (b) (c) (d) 


Fig. 1: Examples of probabilistic programs in our probabilistic language. Program 1a is a symmetric 
1D random walk. The program is almost surely terminating (AST) but not positively almost surely 
terminating (PAST). Program 1b is not AST. Programs 1c and 1d contain dependent variable 
updates with polynomial guards and both programs are PAST. 


tool Terminator [15] has taken termination proving into practice and is able to prove 
termination — or even more general liveness properties — of e.g., device driver software. 
Rather than seeking a single ranking function, it takes a disjunctive termination argument 
using sets of ranking functions. Other results include termination proving methods for 
specific program classes such as linear and polynomial programs, see, e.g., [9,24]. 


Termination of probabilistic program. Probabilistic programs extend sequential pro- 
grams with the ability to draw samples from probability distributions. They are used 
e.g. for, encoding randomized algorithms, planning in AI, security mechanisms, and in 
cognitive science. In this paper, we consider probabilistic while-programs with discrete 
probabilistic choices, in the vein of the seminal works [34] and [37]. Termination of 
probabilistic programs differs from the classical halting problem in several respects, e.g., 
probabilistic programs may exhibit diverging runs that have probability mass zero in total. 
Such programs do not always terminate, but terminate with probability one — they almost 
surely terminate. An example of such a program is given in Figure la where variable x is 
incremented by 1 with probability 1/2, and otherwise decremented with this amount. This 
program encodes a one-dimensional (1D) left-bounded random walk starting at position 
10. Another important difference to classical termination is that the expected number 
of program steps until termination may be infinite, even if the program almost surely 
terminates. Thus, almost sure termination (AST) does not imply that the expected number 
of steps until termination is finite. Programs that have a finite expected runtime are referred 
to as positively almost surely terminating (PAST). Figure lc is a sample program that is 
PAST. While PAST implies AST, the converse does not hold, as evidenced by Figure 1a: 
the program of Figure la terminates with probability one but needs infinitely many steps 
on average to reach x=0, hence is not PAST. (The terminology AST and PAST was coined 
in [8] and has its roots in the theory of Markov processes.) 


Proof rules for AST and PAST. Proving termination of probabilistic programs is hard: 
AST for a single input is as hard as the universal halting problem, whereas PAST is even 
harder [30]. Termination analysis of probabilistic programs is currently attracting quite 
some attention. It is not just of theoretical interest. For instance, a popular way to analyze 
probabilistic programs in machine learning is by using some advanced form of simulation. 
If, however, a program is not PAST, the simulation may take forever. In addition, the use 
of probabilistic programs in safety-critical environments [2,7,20] necessitates providing 
formal guarantees on termination. Different techniques are considered for probabilistic 
program termination ranging from probabilistic term rewriting [3], sized types [17], and 
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Biichi automata theory [14], to weakest pre-condition calculi for checking PAST [31]. A 
large body of works considers proof rules that provide sufficient conditions for proving 
AST, PAST, or their negations. These rules are based on martingale theory, in particular su- 
permartingales. They are stochastic processes that can be (phrased in a simplified manner) 
viewed as the probabilistic analog of ranking functions: the value of a random variable rep- 
resents the “value” of the function at the beginning of a loop iteration. Successive random 
variables model the evolution of the program loop. Being a supermartingale means that 
the expected value of the random variables at the end of a loop does not exceed its value at 
the start of the loop. Constraints on supermartingales form the essential part of proof rules. 
For example, the AST proof rule in [38] requires the existence of a supermartingale whose 
value decreases at least with a certain amount by at least a certain probability on each loop 
iteration. Intuitively speaking, the closer the supermartingales comes to zero — indicating 
termination — the more probable it is that it increases more. The AST proof rule in [38] is ap- 
plicable to prove AST for the program in Figure 1a; yet, it cannot be used to prove PAST of 
Figures 1c-1d. On the other hand, the PAST proof rule in [10,19] requires that the expected 
decrease of the supermartingale on each loop iteration is at least some positive constant 
c and on loop termination needs to be at most zero — very similar to the usual constraint on 
ranking functions. While [10,19] can be used to prove the program in Figure 1c to be PAST, 
these works cannot be used for Figure 1a. They cannot be used for proving Figure 1d to 
be PAST either. The rule for showing non-AST [13] requires the supermartingale to be 
repulsing. This intuitively means that the supermartingale decreases on average with at 
least € and is positive on termination. Figuratively speaking, it repulses terminating states. 
It can be used to prove the program in Figure 1b to be not AST. In summary, while existing 
works for proving AST, PAST, and their negations are generic in nature, they are also 
restricted for classes of probabilistic programs. In this paper, we propose relaxed versions 
of existing proof rules for probabilistic termination that turn out to treat quite a number of 
programs that could not be proven otherwise (Section 4). In particular, (non-)termination 
of all four programs of Figure | can be proven using our proof rules. 


Automated termination checking of AST and PAST: Whereas there is a large body of 
techniques and proof rules, software tool support to automate checking termination of 
probabilistic programs is still in its infancy. This paper presents novel algorithms to 
automate various proof rules for probabilistic programs: the three aforementioned proof 
rules [10,19,38,13] and a variant of the non-AST proof rule to prove non-PAST [13]. 
We also present relaxed versions of each of the proof rules, going beyond the state- 
of-the-art in the termination analysis of probabilistic programs. We focus on so-called 
Prob-solvable loops, extending [4]. Namely, we define Prob-solvable loops as probabilis- 
tic while-programs whose guards compare two polynomials (over program variables) 
and whose body is a sequence of random assignments with polynomials as right-hand 
side such that a variable x, say, only depends on variables preceding x in the loop body. 
While restrictive, Prob-solvable loops cover a vast set of interesting probabilistic programs 
(see Remark 1). An essential property of our programs is that the statistical moments of 
program variables can be obtained as closed-form formulas [4]. The key of our algorithmic 


3 For automation, the proof rule of [38] is considered for constant decrease and probability 
functions. 
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approach is a procedure for computing asymptotic lower, upper and absolute bounds 
on polynomial expressions over program variables in our programs (Section 5). This 
enables a novel method for automating probabilistic termination and non-termination 
proof rules based on (super)martingales, going beyond the state-of-the-art in probabilistic 
termination. Our relaxed proof rules allow us to fully automate (P)AST analysis by using 
only polynomial witnesses. Our experiments provide practical evidence that polynomial 
witnesses within Prob-solvable loops are sufficient to certify most examples from the 
literature and even beyond (Section 6). 


Our termination tool AMBER. We have implemented our algorithmic approach in the 

publicly available tool AMBER. It exploits asymptotic bounds over polynomial mar- 

tingales and uses the tool MORA [4] for computing the first-order moments of program 

variables and the computer algebra system package diofant. Itemploys over- and under- 

approximations realized by a simple static analysis. AMBER establishes probabilistic 

termination in a fully automated manner and has the following unique characteristics: 

— it includes the first implementation of the AST proof rule of [38], and 

— itis the first tool capable of certifying AST for programs that are not PAST and cannot 
be split into PAST subprograms, and 

— itis the first tool that brings the various proof rules under a single umbrella: AST, PAST, 
non-AST and non-PAST. 

An experimental evaluation on various benchmarks shows that: (1) AMBER is superior to 

existing tools for automating PAST [42] and AST [10], (2) the relaxed proof rules enable 

proving substantially more programs, and (3) AMBER is able to automate the termination 

checking of intricate probabilistic programs (within the class of programs considered) 

that could not be automatically handled so far (Section 6). For example, AMBER solves 

23 termination benchmarks that no other automated approach could so far handle. 


Main contributions. To summarize, the main contributions of this paper are: 

1. Relaxed proof rules for (non-)termination, enabling treating a wider class of programs 
(Section 4). 

2. Efficient algorithms to compute asymptotic bounds on polynomial expressions of 

program variables (Section 5). 

. Automation: a realisation of our algorithms in the tool AMBER (Section 6). 

4. Experiments showing the superiority of AMBER over existing tools for proving (P)AST 
(Section 6). 


io) 


2 Preliminaries 


We denote by N and R the set of natural and real numbers, respectively. Further, let R 
denote RU{-++co,—oo}, RE the non-negative reals and R[21,...,2,] the polynomial ring 
in £1,...,2m Over R. We write x := Eq) [p1] Eqo) [p2].--[Pm—1] E(m) for the probabilistic 
update of program variable x, denoting the execution of x := £(,;) with probability p;, 
for j =1,...,m—1, and the execution of x := E(,,) with probability 1— oa Pi» where 
m €E N. We write indices of expressions over program variables in round brackets and 
use F; for the stochastic process induced by expression E. This section introduces our 
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programming language extending Prob-solvable loops [4] and defines the probability 
space introduced by such programs. Let E denote the expectation operator with respect 
to a probability space. We assume the reader to be familiar with probability theory [33]. 


2.1 Programming Model: Prob-Solvable Loops 


Prob-solvable loops [4] are syntactically restricted probabilistic programs with polynomial 
expressions over program variables. The statistical higher-order moments of program vari- 
ables, like expectation and variance of such loops, can always be computed as functions 
of the loop counter. In this paper, we extend Prob-solvable loops with polynomial loop 
guards in order to study their termination behavior, as follows. 


Definition 1 (Prob-solvable loop £). A Prob-solvable loop £ with real-valued variables 
TO) T(m) where m EN, is a program of the form: Tz while Gc doUg end, with 

— (Init) Ix is a sequence z{1) :=1(1),---;2 (m) =T (m) Of M assignments, with rj) ER 

— (Guard) Gz is a strict inequality P >Q, where P, Q ER|£(1),---;£(m)] 

— (Update) Uz is a sequence of m probabilistic updates of the form 


z) = 451) Xz) + Pry Pil agaro tPoz [pj] --- Pia-9] aguo) + Puy); 


where a(jk) ER? are constants, Pijp) € R[x (1),---,2(j-1)] are polynomials, Dijk) € 
[0,1] and X`, pjk <1. 


If £ is clear from the context, the subscript £ is omitted from Zz, Gr, and Uz. Figure | 
gives four example Prob-solvable loops. 


Remark 1 (Prob-solvable expressiveness). The enforced order of assignments in the loop 
body of Prob-solvable loops seems restrictive. However, many non-trivial probabilistic 
programs can be naturally modeled as succinct Prob-solvable loops. These include com- 
plex stochastic processes such as 2D random walks and dynamic Bayesian networks [5]. 
Almost all existing benchmarks on automated probabilistic termination analysis fall within 
the scope of Prob-solvable loops (cf. Section 6). 


In the sequel, we consider an arbitrary Prob-solvable loop £ and provide all definitions 
relative to £. The semantics of £ is defined next, by associating £ with a probability space. 


2.2 Canonical Probability Space 


A probabilistic program, and thus a Prob-solvable loop, can be semantically described as 
a probabilistic transition system [10] or as a probabilistic control flow graph [13], which 
in turn induce an infinite Markov chain (MC) *. An MC is associated with a sequence 
space [33], a special probability space. In the sequel, we associate £ with the sequence 
space of its corresponding MC, similarly as in [25]. 


“Tn fact, [13] consider Markov decision processes, but in absence of non-determinism in 
Prob-solvable loops, Markov chains suffice for our purpose. 
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Definition 2 (State, Run of £L). The state of Prob-solvable loop L over m variables, is a 
vector s E R™. Let s|j] or [a ;)| denote the j-th component of s representing the value of 
the variable x,;) in state s. A run ù of L is an infinite sequence of states. 


Note that any infinite sequence of states is a run. Infeasible runs will however be assigned 
measure 0. We write s F B to denote that the logical formula B holds in state s. 


Definition 3 (Loop Space of £). The Prob-solvable loop £ induces a canonical filtered 

probability space (Q£, X£ ,(F£)icn,P“), called loop space, where 

— the sample space 2 :=(IR™)* is the set of all program runs, 

— the o-algebra X£ is the smallest c-algebra containing all cylinder sets Cyl() := {70 | 
0 €(R™)*} for all finite prefixes n € (R™)*, that is X£ :=({Cyl(m) |r E€ (R”)+ po, 

— the filtration (F£ );en contains the smallest -algebras containing all cylinder sets for 
all prefixes of length i+, i.e. FE := Y Cyl (r) |n € (R™)*, || =i+1})o. 

— the probability measure P£ is defined as P£(Cyl(m)) :=p(7), where p is given by 


s):= S nss')\:= plns) [s =s], ifs- Ge 
yearns Detar E 


uz(s) denotes the probability that, after initialization Tz, the loop £ is in state s. 
Li (s,s) denotes the probability that, after one loop iteration starting in state s, the 
resulting program state is s'. |...] represent the Iverson brackets, i.e. [s' = s] is 1 iff s' = s. 


Intuitively, P(Cyl()) is the probability that prefix 7 is the sequence of the first |r| 
program states when executing L. Moreover, the o-algebra F; intuitively captures the 
information about the program run after the loop body U has been executed 7 times. We 
note that the effect of the loop body U is considered as atomic. 

In order to formalize termination properties of a Prob-solvable loop £, we define the 
looping time of £ to be a random variable in £’s loop space. 


Definition 4 (Looping Time of £). The looping time of £ is the random variable TS : 
Q—>NU{oo}, where TS (9) :=inf {ie N| 0; 7G}. 


Intuitively, the looping time T~Y maps a program run of £ to the index of the first state fal- 
sifying the loop guard G of £ or to œ if no such state exists. We now formalize termination 
properties of £ using the looping time TY. 


Definition 5 (Termination of £). The Prob-solvable loop £ is AST if P(T 79 <œ) =1. 
L is PAST ifE(T~9) <œ. 


2.3 Martingales 


While for arbitrary probabilistic programs, answering P(T~Y < 0c) and E(TY < oo) 
is undecidable, sufficient conditions for AST, PAST and their negations have been de- 
veloped [10,19,38,13]. These works use (super)martingales which are special stochastic 
processes. In this section, we adopt the general setting of martingale theory to a Prob- 
solvable loop £ and then formalize sufficient termination conditions for £ in Section 3. 
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Definition 6 (Stochastic Process of £). Every arithmetic expression E over the program 
variables of L induces the stochastic process (Ej; )ien, Bi: Q> R with E;(0):= E(0;). 
Fora run 0 of L, E;() is the evaluation of E in the i-th state of 0. 


In the sequel, for a boolean condition B over program variables x of £L, we write B; to 
refer to the result of substituting x by x; in B. 


Definition 7 (Martingales). Let (2,’,(F;)ien,P) be a filtered probability space and 
(M;)ien be an integrable stochastic process adapted to (F;)ien. Then (Mi)ien is a 
martingale if E(M;41 | Fi) = M; (or equivalently E(M,.,—M; | Fi) = 0). More- 
over, (M;)ien is called a supermartingale (SM) if E(Mi+1 | Fi) < M; (or equivalently 
t(Mj41—M; | Fi) <0). For an arithmetic expression E over the program variables of L, 
the conditional expected value E( E41 — E; | Fi) is called the martingale expression of E. 


3 Proof Rules for Probabilistic Termination 


While AST and PAST are undecidable in general [30], sufficient conditions, called proof 
rules, for AST and PAST have been introduced, see e.g. [10,19,38,13]. In this section, we 
survey four proof rules, adapted to Prob-solvable loops. In the sequel, a pure invariant 
is a loop invariant in the classical deterministic sense [27]. Based on the probability space 
corresponding to £, a pure invariant holds before and after every iteration of £. 


3.1 Positive Almost Sure Termination (PAST) 


The proof rule for PAST introduced in [10] relies on the notion of ranking supermartin- 
gales (RSMs), which is a SM that decreases by a fixed positive € on average at every loop 
iteration. Intuitively, RSMs resemble ranking functions for deterministic programs, yet 
for probabilistic programs. 


Theorem 1 (Ranking-Supermartingale-Rule (RSM-Rule) [10], [19]). Let M:R™ > 
R be an expression over the program variables of £L and I a pure invariant of L. Assume 
the following conditions hold for all i € N: 


1. (Termination) G\I => M>0 
2. (RSM Condition) Gi^ I, => E(Mi+1 — Mi | Fi) <—« for some €>0. 


Then, £L is PAST. Further, M is called an e-ranking supermartingale. 


Example 1. Consider Figure 1c, set M :=100— x? — y? and €:=2 and let I be true. Con- 
dition (1) of Theorem | trivially holds. Further, M is also an e-ranking supermartingale, as 
t(Mi41—M; | F;) =100—E(2?, , | F;) —E(y?,, | Fi;)-— 100+ 2? +y? =—2- x? < -2. 
That is because E(x?,, | F;) = £? +1 and E(y?,, | Fi) =y? +x? +1. Figure Ic is thus 
proved PAST using the RSM-Rule. 


498 M. Moosbrugger et al. 


3.2 Almost Sure Termination (AST) 


Recall that Figure la is AST but not PAST, and hence the RSM-rule cannot be used 
for Figure la. By relaxing the ranking conditions, the proof rule in [38] uses general 
supermartingales to prove AST of programs that are not necessarily PAST. 


Theorem 2 (Supermartingale-Rule (SM-Rule) [38]). Let M :R™ — Ro be an expres- 
sion over the program variables of L and I a pure invariant of L. Let p: R>o — (0,1] (for 
probability) and d:R>o — Ryo (for decrease) be antitone (i.e. monotonically decreasing) 
functions. Assume the following conditions hold for alli € N: 

1. (Termination) G\I => M>0 

2. (Decrease) G,A\l, => P(Mi+ı —M;< —d(M;) | Fi) > p(M;) 

Then, L is AST. 


Intuitively, the requirement of d and p being antitone forbids that the “execution progress’ 
of £ towards termination becomes infinitely small while still being positive. 


2 


Example 2. The SM-Rule can be used to prove AST for Figure la. Consider M := z, 
p:=1/2, d:= 1 and I := true. Clearly, p and d are antitone. The remaining conditions of 
Theorem 2 also hold as (1) x > 0 => x > 0; (2) x decreases by d with probability p in 
every iteration; and (3) E(Mi+1 — Mi | F;)=2;-2; <0. 


3.3 Non-Termination 


While Theorems | and 2 can be used for proving AST and PAST, respectively, they are 
not applicable to the analysis of non-terminating Prob-solvable loops. Two sufficient con- 
ditions for certifying the negations of AST and PAST have been introduced in [13] using 
so-called repulsing-supermartingales. Intuitively, a repulsing-supermartingale M on 
average decreases in every iteration of £ and on termination is non-negative. Figuratively, 
M repulses terminating states. 


Theorem 3 (Repulsing-AST-Rule (R-AST-Rule) [13]). Let M :R” — R be an expres- 
sion over the program variables of £L and I a pure invariant of L. Assume the following 
conditions hold for all i € N: 

1. (Negative) Mo <0 

2. (Non-Termination) —6 AI => M > 0 

3. (RSM Condition) Gi ^I; => E(Mi+1 — Mi | Fi) <—« for some e> 0 

4. (c-Bounded Differences) |Mj41—Mi|<c, for some c>0. 

Then, L is not AST. M is called an e-repulsing supermartingale with c-bounded differences. 


Example 3. Consider Figure 1b and let M := — x, c:=3, € := 1/2 and T := true. All four 
above conditions hold: (1) —£zo = —10 < 0; (2) x£ < 0 => —a>0; (3) E(Mi+1ı — Mi | 
F;)=—2;—V2+a;=—1/2<—6 and (4) |z;—2;41| <3. Thus, Figure 1b is not AST. 


While Theorem 3 can prove programs not to be AST, and thus also not PAST, it cannot 
be used to prove programs not to be PAST when they are AST. For example, Theorem 3 
cannot be used to prove that Figure 1a is not PAST. To address such cases, a variation of the 
R-AST-Rule [13] for certifying programs not to be PAST arises by relaxing the condition 
e > 0 of the R-AST-Rule to e > 0. We refer to this variation by Repulsing-PAST-Rule 
(R-PAST-Rule). 
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4 Relaxed Proof Rules for Probabilistic Termination 


While Theorems 1-3 provide sufficient conditions proving PAST, AST and their negations, 
the applicability to Prob-solvable loops is somewhat restricted. For example, the RSM- 
Rule cannot be used to prove Figure 1d to be PAST using the simple expression M := x, as 
explained in detail with Example 4, but may require more complex witnesses for certifying 
PAST, complicating automation. In this section, we relax the conditions of Theorems 1-3 
by requiring these conditions to only hold “eventually”. A property P (i) parameterized 
by a natural number i € N holds eventually if there is an ig € N such that P (i) holds for all 
i> tọ. Our relaxations of probabilistic termination proof rules can intuitively be described 
as follows: If £, after a fixed number of steps, almost surely reaches a state from which 
the program is PAST or AST, then the program is PAST or AST, respectively. Let us first 
illustrate the benefits of reasoning with “eventually” holding properties for probabilistic 
termination in the following example. 


xi=20,y:=0 x:=1,y:=2 
while x > 0 do while x > 0 do 
y:=y+1 y :=1/2-y 
x:=x+(y—5) [1/2] x—(y—5) x:=x+1—y [?/3]x—1+y 
end end 
(a) (b) 


Fig. 2: Prob-solvable loops which require our relaxed proof rules for termination analysis. 


Example 4 (Limits of the RSM-Rule and SM-Rule). Consider Figure 1d. Setting M := zx, 
we have the martingale expression E(M; 1 — M; | Fi) =—¥?/2+-y;+3/2=— /2+i+3/2. 
Since E(x; +4 — £; | F;) is non-negative for i € {0,1,2,3}, we conclude that M is not an 
RSM. However, Figure 1d either terminates within the first three iterations or, after three 
loop iterations, is in a state such that the RSM-Rule is applicable. Therefore, Figure 1d is 
PAST but the RSM-Rule cannot directly prove using M := x. A similar restriction of the 
SM-Rule can be observed for Figure 2a. By considering M := x, we derive the martingale 
expression E(z;}1 — x; | F;) =0, implying that M is a martingale for Figure 2a. However, 
the decrease function d for the SM-Rule cannot be defined because, for example, in the 
fifth loop iteration of Figure 2a, there is no progress as x is almost surely updated with its 
previous value. However, after the fifth iteration of Figure 2a, x always decreases by at 
least 1 with probability 1/2 and all conditions of the SM-Rule are satisfied. Thus, Figure 2a 
either terminates within the first five iterations or reaches a state from which it terminates 
almost surely. Consequently, Figure 2a is AST but the SM-Rule cannot directly prove it 
using M:=2. 


We therefore relax the RSM-Rule and SM-Rule of Theorems | and 2 as follows. 


Theorem 4 (Relaxed Termination Proof Rules). For the RSM-Rule to certify PAST of 
L, it is sufficient that conditions (1)-(2) of Theorem 1 hold eventually (instead of for all 
i€ N). Similarly, for the SM-Rule to certify AST of L, it is sufficient that conditions (1)-(3) 
of Theorem 2 hold eventually. 
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Proof. We prove the relaxation of the RSM-Rule. The proof of the relaxed SM-Rule is 
analogous. Let £:=Z while G doU end be as in Definition 1. Assume £ satisfies the con- 
ditions (1)-(2) of Theorem 1 after some io € N. We construct the following probabilistic 
program P, where 7 is anew variable not appearing in £: 


T31:=0 
while i<ip doU;i:=i+1 end (1) 
while G doU end 


We first argue that if P is PAST, then so is £. Assume P to be PAST. Then, the looping 
time of £ is either bounded by io or it is PAST, by the definition of P. In both cases, £ 
is PAST. Finally, observe that P is PAST if and only if its second while-loop is PAST. 
However, the second while-loop of P can be certified to be PAST using the RSM-Rule 
and additionally using 7 > zo as an invariant. 


Remark 2. The central point of our proof rule relaxations is that they allow for simpler 
witnesses. While for Example 4 it can be checked that M := x + 2¥+5 is an RSM, the 
example illustrates that the relaxed proof rule allows for a much simpler PAST witness 
(linear instead of exponential). This simplicity is key for automation. 


Similar to Theorem 4, we relax the R-AST-Rule and the R-PAST-Rule. However, com- 
pared to Theorem 4, it is not enough for a non-termination proof rule to certify non-AST 
from some state onward, because £ may never reach this state as it might terminate earlier. 
Therefore, a necessary assumption when relaxing non-termination proof rules comes with 
ensuring that £ has a positive probability of reaching the state after which a proof rule 
witnesses non-termination. This is illustrated in the following example . 


Example 5 (Limits of the R-AST-Rule). Consider Figure 2b and set M := — zx. As a result, 
we get E(M;., — M; | Fi) =¥/6—1/3 = 27*/3 — 1/3. Thus, E(M;4, — M; | Fi) =0 for 
i=0, implying that M cannot be an e-repulsing supermartingale with € > 0 for all ¿ € N. 
However, after the first iteration of £, M satisfies all requirements of the R-AST-Rule. 
Moreover, £ always reaches the second iteration because in the first iteration z almost 
surely does not change. From this follows that Figure 2b is not AST. 


The following theorem formalizes the observation of Example 5 relaxing the R-AST- 
Rule and R-PAST-Rule of Theorem 3. 


Theorem 5 (Relaxed Non-Termination Proof Rules for). For the R-AST-Rule to certify 
non-AST for £L (Theorem 3), as well as for the R-PAST-Rule to certify non-PAST for L 
(Theorem 3), if P( Mi, <0) >0 for some ig > 0, it suffices that conditions (2)-(4) hold for 
all i> ig (instead of for alli € N). 


The proof of Theorem 5 is similar to the one of Theorem 4 and available in [40]. In 
what follows, whenever we write RSM-Rule, SM-Rule, R-AST-Rule or R-PAST-Rule we 
refer to our relaxed versions of the proof rules. 
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5 Algorithmic Termination Analysis through Asymptotic Bounds 


The two major challenges when automating reasoning with the proof rules of Sections 3 
and 4 are (i) constructing expressions M over the program variables and (ii) proving 
inequalities involving E(M;+1 — M; | Fi). In this section, we address these two challenges 
for Prob-solvable loops. For the loop guard Gr = P > Q, let Gz denote the polynomial 
P-— Q. As before, if £ is clear from the context, we omit the subscript £. It holds that 
G > 0 is equivalent to G. 


(i) Constructing (super)martingales M: For a Prob-solvable loop £, the polynomial G 
is a natural candidate for the expression M in termination proof rules (RSM-Rule, SM- 
Rule) and —G in the non-termination proof rules (R-AST-Rule, R-PAST-Rule). Hence, 
we construct potential (super)martingales M by setting M := G for the RSM-Rule and 
the SM-Rule, and M := —G for the R-AST-Rule and the R-PAST-Rule. The property 
G = G>0, a condition of the RSM-Rule and the SM-Rule, trivially holds. Moreover, 
for the R-AST-Rule and R-PAST-Rule the condition =G => —G' > 0 is satisfied. The 
remaining conditions of the proof rules are: 
— RSM-Rule: (a) G; => E(Gi+1 — Gi | Fi) < —e for some € > 0 
— SM-Rule: (a) G; => E(Gi+1 — Gi | Fi) < 0and (b) Gi => P(Gi41-G; < -d| Fi) > p 
for some p€ (0,1] and de R* (for the purpose of efficient automation, we restrict the 
functions d(r) and p(r) to be constant) 
— R-AST-Rule: (a) G; ==> (Gi L1 +G | Fi) <—e for some € > 0 and (b) |Gi+1 —Gil < 
c, for some c> 0. 
All these conditions express bounds over G;. Choosing G as the potential witness may 
seem simplistic. However, Example 4 already illustrated how our relaxed proof rules can 
mitigate the need for more complex witnesses (even exponential ones). The computational 
effort in our approach does not lie in synthesizing a complex witness but in constructing 
asymptotic bounds for the loop guard. Our approach can therefore be seen as comple- 
mentary to approaches synthesizing more complex witnesses [10,11,13]. The martingale 
expression E(Gi+1 — G; | Fi) is an expression over program variables, whereas G;41—G; 
cannot be interpreted as a single expression but through a distribution of expressions. 


Definition 8 (One-step Distribution). For expression H over the program variables of 
Prob-solvable loop £, let the one-step distribution U# be defined by E+ P(Hi41 =E| Fi) 
with support set supp(U#) :={ B |U# (B) > 0}. We refer to expressions B € supp(U#) 
by branches of H. 


The notation U a is chosen to suggest that the loop body Upg is “applied” to the expression 
H, leading to a distribution over expressions. Intuitively, the support supp (U = ) ofan 
expression H contains all possible updates of H after executing a single iteration of Ug. 


(ii) Proving inequalities involving E( M; z1 — M; | Fi): To automate the termination analy- 
sis of £ with the proof rules from Section 3, we need to compute bounds for the expression 
E(Gi+1ı— G: | Fi) as well as for the branches of G. In addition, our relaxed proof rules from 
Section 4 only need asymptotic bounds, i.e. bounds which hold eventually. In Section 5.2, 
we propose Algorithm 1 for computing asymptotic lower and upper bounds for any poly- 
nomial expression over program variables of £. Our procedure allows us to derive bounds 
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for E(G;,1—G; | F;) and the branches of G. Before formalizing our method, let us first il- 
lustrate how reasoning with asymptotic bounds helps to apply termination proof rules to £. 


+ 


Example 6 (Asymptotic Bounds for the RSM-Rule). Consider the following program: 


x:i=ly:=0 
while x < 100 do 
y:=y+1 
| x= 2x+y? [1/2] 1/2-x 
end 


Observe y; = i. The martingale expression for G = 100 — z is E(Gi41 — G; | Fi) = 
1/2(100— 2a; — (i+1)?) + 1/2(100 —*:/2) — (100 — z;) = —*:/4— ?/2—i — 1/2. Note that 
if the term —*:/4 would not be present in E(G;,1 — G; | Fi), we could certify the program 
to be PAST using the RSM-Rule because —**/2—i—1/2 < —1/2 for all i > 0. However, 
by taking a closer look at the variable x, we observe that it is eventually and almost 
surely lower bounded by the function a-2~* for some a € R*+. Therefore, eventually 
—«i/4<—-2~* for some GE R+. Thus, eventually E(Gj41—G;| F;) < —y- i? for some 
7 €R*. By our RSM-Rule, the program is PAST. 

Now, the question arises how the asymptotic lower bound a-2~* for x can be computed 
automatically. In every iteration, x is either updated with 2x +y? or 1/2- x. Considering 
the updates as recurrences, we have the inhomogeneous parts y? and 0. Asymptotic lower 
bounds for these parts are 7? and 0, respectively, where 0 is the “asymptotically smallest 
one“. Taking 0 as the inhomogeneous part, we construct two recurrences: (1) lo =a, [i414 = 
21;+0 and (2) lo =a, li+1 = 1/2- l; +0, for some a € RT. Solutions to these recurrences are 
a-2¢ and a- 274, where the last one is the desired lower bound because it is “asymptotically 
smaller“. We will formalize this idea of computing asymptotic bounds in Algorithm 1. 


We next present our method for computing asymptotic bounds over martingale expres- 
sions in Sections 5.1-5.2. Based on these asymptotic bounds, in Section 5.3 we introduce 
algorithmic approaches for our proof rules from Section 4, solving our aforementioned 
challenges (i)-(ii) in a fully automated manner (Section 5.4). 


5.1 Prob-solvable Loops and Monomials 


Algorithm 1 computes asymptotic bounds on monomials over program variables in a 
recursive manner. To ensure termination of Algorithm 1, it is important that there are no 
circular dependencies among monomials. By the definition of Prob-solvable loops, this 
indeed holds for program variables (monomials of order 1). Every Prob-solvable loop £ 
comes with an ordering on its variables and every variable is restricted to only depend 
linearly on itself and polynomially on previous variables. Acyclic dependencies naturally 
extend from single variables to monomials. 


Definition 9 (Monomial Ordering). Let £ be a Prob-solvable loop with variables 
T) T(m): Lety = Mire and y2 = Mirë, where p;,q; €N, be two monomials 
over the program variables. The order < on monomials over the program variables of 
L is defined by yy < yo <=> (Dm) P1) <tex (Gmi-5G1), where <iex is the lexico- 
graphic order on N™. The order < is total because <jex is total. With yı < y2 we denote 


Yı SY21 FY2- 
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To prove acyclic dependencies for monomials we exploit the following fact. 
Lemma 1. Let y1,y2,21,22 be monomials. If yı < zı and y2 < zə then y1 -Yy2 < 21° 22. 
By structural induction over monomials and Lemma 1, we establish: 


Lemma 2 (Monomial Acyclic Dependency). Let x be a monomial over the program 
variables of L. For every branch B € supp(Uz) and monomial y in B, y < x holds. 


Lemma 2 states that the value of a monomial x over the program variables of £ only 
depends on the value of monomials y which precede x in the monomial ordering <. This 
ensures the dependencies among monomials over the program variables of £ to be acyclic. 


5.2 Computing Asymptotic Bounds for Prob-solvable Loops 


The structural result on monomial dependencies from Lemma 2 allows for recursive proce- 
dures over monomials. This is exploited in Algorithm 1 for computing asymptotic bounds 
for monomials. The standard Big-O notation does not differentiate between positive and 
negative functions, as it considers the absolute value of functions. We, however, need to 
differentiate between functions like 2f and —2’. Therefore, we introduce the notions of 
Domination and Bounding Functions. 


Definition 10 (Domination). Let F be a finite set of functions from N to R. A function 
g:N— Ris dominating F if eventually a-g(i) > f (i) forall f € F and some aE Rt. A 
function g:N— R is dominated by F ifall f € F dominate {g}. 


Intuitively, a function f dominates a function g if f eventually surpasses g modulo a 
positive constant factor. Exponential polynomials are sums of products of polynomials 
with exponential functions, i.e. X- jPi (x) - c}, where cj € RE . All functions arising in 
Algorithms 1-4 are exponential polynomials. For a finite set F of exponential polynomials, 
a function dominating F and a function dominated by F are easily computable with 
standard techniques, by analyzing the terms of the functions in the finite set F. With 
dominating (F’) we denote an algorithm computing an exponential polynomial dominat- 
ing F. With dominated (F’) we denote an algorithm computing an exponential polynomial 
dominated by F'. We assume the functions returned by the algorithms dominating (F’) 
and dominated (F’) to be monotone and either non-negative or non-positive. 


Example 7 (Domination). The following statements are true: 0 dominates {—i? +i? +5}, 
i? dominates {27}, i? - 2* dominates {i? - 2’ + i°,i° + i3,2~*}, i is dominated by 
{i? —2i+1,5i—5} and —2' is dominated by {2'—7?,—10-27*}. 


Definition 11 (Bounding Function for £). Let E be an arithmetic expression over the 

program variables of L. Let l,u:N— R be monotone and non-negative or non-positive. 

1. Lis a lower bounding function for E if eventually P(a-l(i) < E; | TF >i) =1 for 
some waER®. 

2. wis an upper bounding function for E if eventually P(E; < a-u(i)| T79 >i) =1 for 
some wER®. 

3. An absolute bounding function for E is an upper bounding function for | E]. 
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A bounding function imposes a bound on an expression E over the program variables 
holding eventually, almost surely, and modulo a positive constant factor. Moreover, bounds 
on E only need to hold as long as the program has not yet terminated. 

Given a Prob-solvable loop £ and a monomial x over the program variables of £, 
Algorithm 1 computes a lower and upper bounding function for x. Because every poly- 
nomial expression is a linear combination of monomials, the procedure can be used to 
compute lower and upper bounding functions for any polynomial expression over L’s 
program variables by substituting every monomial with its lower or upper bounding 
function depending on the sign of the monomial’s coefficient. Once a lower bounding 
function / and an upper bounding function u are computed, an absolute bounding function 
can be computed by dominating ({u,—l}). 

In Algorithm 1, candidates for bounding functions are modeled using recurrence 
relations. Solutions s(7) of these recurrences are closed-form candidates for bounding 
functions parameterized by loop iteration 7. Algorithm 1 relies on the existence of closed- 
form solutions of recurrences. While closed-forms of general recurrences do not always 
exist, a property of C-finite recurrences, linear recurrences with constant coefficients, is 
that their closed-forms always exist and are computable [32]. In all occurring recurrences, 
we consider a monomial over program variables as a single function. Therefore, through- 
out this section, all recurrences arising from a Prob-solvable loop £ in Algorithm 1 are 
C-finite or can be turned into C-finite recurrences. Moreover, closed-forms s(i) of C-finite 
recurrences are given by exponential polynomials. Therefore, for any solution s(z) toa 
C-finite recurrence and any constant r € R, the following holds: 


Ja, BER? Jio EN: Vi> ip: a-8(i) <s(i+r) < B-s(2). (2) 


Intuitively, the property states that constant shifts do not change the asymptotic behavior 
of s. We use this property at various proof steps in this section. Moreover, we recall that 
limits of exponential polynomials are computable [23]. 

For every monomial x, every branch B € supp(Ué) is a polynomial over the pro- 
gram variables. Let Rec(x) := {coefficient of xin B | B € supp(Uz)} denote the set 
of coefficients of the monomial z in all branches of £. Let Inhom(ax) := {B — c- < | 
B € supp(U¢) and c= coefficient of x in B} denote all the branches of the monomial x 
without x and its coefficient. The symbolic constants cı and cz in Algorithm | represent 
arbitrary initial values of the monomial x for which bounding functions are computed. 
The fact that they are symbolic ensures that all potential initial values are accounted for. 
cı represents positive initial values and —c2 negative initial values. The symbolic constant 
dis used in the recurrences to account for the fact that the bounding functions only hold 
modulo a constant. Intuitively, if we use the bounding function in a recurrence we need 
to restore the lost constant. Sign(a) is an over-approximation of the sign of the monomial 
a, i.e., if Ji: P(x; >0) >0, then + € Sign(x) and if di: P(x; <0) >0, then — € Sign(z). 

Lemma 2, the computability of closed-forms of C-finite recurrences and the fact 
that within a Prob-solvable loop only finitely many monomials can occur, implies the 
termination of Algorithm 1. Its correctness is stated in the next theorem. 


Theorem 6 (Correctness of Algorithm 1). The functions I(i),u(i) returned by Algo- 
rithm I on input L and x are a lower- and an upper bounding function for x, respectively. 
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Algorithm 1: Computing bounding functions for monomials 


Input: A Prob-solvable loop £ and a monomial x over £’s variables 

Output: Lower and upper bounding functions l(i), u(i) for x 

inhomBounds Upper := {upper bounding function of P| P € Inhom(x)} (recursive call) 
inhomBoundsLower := {lower bounding function of P| P € Inhom(a)} (recursive call) 
U (i) := dominating (inhomBounds Upper) 

L(t) := dominated (inhomBoundsLower ) 

maxRec := max Rec(x) 

minRec := min Rec(x) 

I:=0 

if +€ Sign(x) then [:=IU{ci}; 

if — € Sign (x) then [:=IU{—c2}; 

uCand := closed-forms of {yi+1 =r- yi +d- U (i) |r € {minRec,marRec},yo € I} 
lCand := closed-forms of {y;i+1 =r -yi +d- L(i) |r € {minRec,mazRec},yo € I} 

u(i) := dominating (uCand) 

l(i) := dominated (ICand) 

return l (i),u(i) 
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Proof. Intuitively, it has to be shown that regardless of the paths through the loop body 
taken by any program run, the value of x is always eventually upper bounded by some func- 
tion in wCand and eventually lower bounded by some function in [Cand (almost surely 
and modulo positive constant factors). We show that x is always eventually upper bounded 
by some function in wCand. The proof for the lower bounding function is analogous. 

Let V € X be a possible program run, i.e. P(C'yl(z)) > 0 for all finite prefixes r of ð. 
Then, for every i € N, if TTF (9) >i, the following holds: 


Tiy (V) =a -@i(V)+Payi(V) or ziz (V) =a) i(t) + Peay: (0) 
or... OF Ligi ( O) =a) ril) + Pii lð), 


where aj) € Rec(x) and Pj) € Inhom(x) are polynomials over program variables. 
Let u1 (i),... up (i) be upper bounding functions of P(1),..-, P(x), which are computed 
recursively at line 10. Moreover, let U (i) := dominating ({u1 (i), up (i)}), minRec = 
min Rec(x) and maxRec =max Rec(x). Let lo € N be the smallest number such that for 
all j € {1,...,k} andi > lo: 


P(Pjy S74; (7) | TY >i) =1 for some a; ER*, and (3) 
uj (i) < B-U (i) for some BER* (4) 


Thus, all inequalities from the bounding functions uj and the dominating function U hold 
from lo onward. Because U is a dominating function, it is by definition either non-negative 
or non-positive. Assume U (i) to be non-negative, the case for which U (i) is non-positive is 
symmetric. Using the facts (3) and (4), we establish: For the constant y := 8 -maXj=1..k@j, 
itholds that P(P,;); <y-U (i) | TY > 2) =1 for all j € {1,...,k} and all i > lo. Let 1; be the 
smallest number such that l4 > lo and U (i+lo) < &-U (i) for all i> l, and some ô ERF. 


Case 1, x; is almost surely negative for all i > lı: Consider the recurrence relation 
Yo = mM, Yi41 = minRec- y; +n: U (i), where n := max(y,6) and m is the maximum 
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value of x;,(V) among all possible program runs v. Note that m exists because there 
are only finitely many values 2, (V) for possible program runs J. Moreover, m is neg- 
ative by our case assumption. By induction, we get P(x; < y;_1, | TTF >i) =1 for all 
i > lı. Therefore, for a closed-form solution s(i) of the recurrence relation y;, we get 
P(x; <s(i—l,)|T79 >i) =1 forall i>1,. We emphasize that s exists and can effectively 
be computed because y; is C-finite. Moreover, s(i— l1) < @- (2) for all i > l2 for some 
lə >l; and some 0 € R*. Therefore, s satisfies the bound condition of an upper bounding 
function. Also, s is present in wCand by choosing the symbolic constants cz and d to 
represent —m and 77 respectively. The function u(i) := dominating (uCand), at line 12, 
is dominating wCand (hence also s), is monotone and either non-positive or non-negative. 
Therefore, u(i) is an upper bounding function for zx. 


Case 2, x; is not almost surely negative for all i > lı: Thus, there is a possible pro- 
gram run W such that x;(0’) > 0 for some i > l4. Let be > lı be the smallest number 
such that 27, (ô) > 0 for some possible program run 3. This number certainly exists, 
as x;(0’) is non-negative for some i > lı. Consider the recurrence relation yo = m, 
Yit1 = maxRec-y; +n- U (i), where 7 := max(y,6) and m is the maximum value of 
Tı, (V) among all possible program runs v. Note that m exists because there are only finitely 
many values 7, (V) for possible program runs V7. Moreover, m is non-negative because 
m> «1, (0) > 0. By induction, we get P(x; < yi- |T~9 >i) =1 for all i > l2. Therefore, 
for a solution s(7) of the recurrence relation y;, we get P(x; < s(i—lz) | TY >i) =1 forall 
i> Ip. As above, s exists and can effectively be computed because y; is C-finite. Moreover, 
s(i—l2) <0- s(i) for all i> 13 for some l3 > lz and some 9 € R*+. Therefore, s satisfies the 
bound condition of an upper bounding function Also, s is present in uCand by choosing 
the symbolic constants cı and d to represent m and n respectively. The function u(i) := 
dominating(uCand), at line 12, is dominating wCand (hence also s), is monotone and ei- 
ther non-positive or non-negative. Therefore, u(i) is an upper bounding function for z. 


Example 8 (Bounding functions). We illustrate Algorithm 1 by computing bounding 
functions for x and the Prob-solvable loop from Example 6: We have Rec(x) := {2,5} and 
Inhom(x) = {y?,0}. Computing bounding functions recursively for P € Inhom(zx) = 
{y?,0} is simple, as we can give exact bounds leading to inhomBounds Upper = {i? ,0} 
and inhomBoundsLower = {i7,0}. Consequently, we get U (i) =i7, L(i) =0, maxRec = 
2and minRec = 5. With a rudimentary static analysis of the loop, we determine the (exact) 
over-approximation Sign(x) := {+} by observing that xo > 0 and all P € Inhom(x) are 
strictly positive. Therefore, uCand is the set of closed-form solutions of the recurrences 
Yo = C1, Yiti := 2yi +d- i? and yo := C1, Yiyi := tyi +d- i?. Similarly, [Cand is 
the set of closed-form solutions of the recurrences yo := c1, Yi41 := 2y; and Yo := C1, 
Yii i= SYi- Using any algorithm for computing closed-forms of C-finite recurrences, 
we obtain uCand = {c1 — di? — 2di + 3d2' — 3d, c,2~* + 2di? — 8di — 12d2~* + 12d} 
and ICand = {c2", c,2~"}. This leads to the upper bounding function u(i) =2° and the 
lower bounding function l(i) =2~*. The bounding functions l(i) and u(i) can be used to 
compute bounding functions for expressions containing « linearly by replacing x by l(i) 
or u(i) depending on the sign of the coefficient of x. For instance, eventually and almost 


surely the following inequality holds: — 4 i i—i <- a 27 E —i— 4 for 


some œ E€ R*+. The inequality results from replacing x; by l(i). Therefore, eventually and 
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almost surely — 7 — 5 —2 5 <—-i? for some 6 € Rt. Thus, —7? is an upper bounding 
2 


. . Xi _ 7 pe 1 
function for the expression — 3 — 5 —i—35. 


Remark 3. Algorithm | describes a general procedure computing bounding functions for 
special sequences. Figuratively, that is for sequences s such that s;+1 = f (s;,i) but in every 
step the function f is chosen non-deterministically among a fixed set of special functions 
(corresponding to branches in our case). We reserve the investigation of applications of 
bounding functions for such sequences beyond the probabilistic setting for future work. 


5.3 Algorithms for Termination Analysis of Prob-solvable Loops 


Using Algorithm | to compute bounding functions for polynomial expressions over 
program variables at hand, we are now able to formalize our algorithmic approaches 
automating the termination analysis of Prob-solvable loops using the proof rules from 
Section 4. Given a Prob-solvable loop £ and a polynomial expression E over £’s variables, 
we denote with lbf (E), ubf (E) and abf (E) functions computing a lower, upper and 
absolute bounding function for E respectively. Our algorithmic approach for proving 
PAST using the RSM-Rule is given in Algorithm 2. 


Algorithm 2: Ranking-Supermartingale-Rule for proving PAST 
Input: Prob-solvable loop £ 
Output: If true then £ with G satisfies the RSM-Rule; hence £ is PAST 
E =E(Gi41 -Gi | Fi) 
u(t) := ubf (E) 
limit :=limi.0u(t) 
return limit <0 


e U Ne 


Example 9 (Algorithm 2). Let us illustrate Algorithm 2 with the Prob-solvable loop from 
Examples 6 and 8. Applying Algorithm 2 on £ leads to E=- 7} — 2 —i— L, We obtain 
the upper bounding function u(i) := —i? for E. Because lim; oo u(i) <0, Algorithm 2 
returns true. This is valid because u(i) having a negative limit witnesses that E is eventually 


bounded by a negative constant and therefore is eventually an RSM. 


We recall that all functions arising from £ are exponential polynomials (see Section 5.2) 
and that limits of exponential polynomials are computable [23]. Therefore, the termination 
of Algorithm 2 is guaranteed and its correctness is stated next. 


Theorem 7 (Correctness of Algorithm 2). If Algorithm 2 returns true on input £, then 
L with Gc satisfies the RSM-Rule. 


Proof. When returning true at line 4 we have P(E; < a- u(i) | TS > i) = 1 for 
all i > ig and some ig € N, a € Rt. Moreover, u(i) < —e for all i > i; for some 
i € N, by the definition of lim. From this follows that Vi > max(io,7,) almost surely 
Gi => E(Gi+1 — Gi | Fi) <—a-e, which means G is eventually an RSM. 


Our approach proving AST using the SM-Rule is captured with Algorithm 3. 
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Algorithm 3: Supermartingale-Rule for proving AST 

Input: Prob-solvable loop £ 

Output: If true, L with G satisfies the SM-Rule with constant d and p; hence £ is AST 

E =E(Gi41 -Gi | Fi) 

u(i):= ubf (E) 

if not eventually u(i) <0 then return false ; 

for B € supp(Uf) do 
d(i):=ubf(B-G) 
limit :=limiood(t) 
if limit <0 then return true ; 

end 

return false 


Cer nN Dn A BW YD 


Example 10 (Algorithm 3). Let us illustrate Algorithm 3 for the Prob-solvable loop £ 
from Figure 2a: Applying Algorithm 3 on £ yields E = 0 and u(i) =0. The expression 
G (= x) has two branches. One of them is x; — yi + 4, which occurs with probability 
1/2. When the for-loop of Algorithm 3 reaches this branch B = x; — yi +4 on line 4, it 
computes the difference B—G' = —y; +4. An upper bounding function for B—G is given 
by d(i) =—i. Because lim;_,,.,.d(z) <0, Algorithm 3 returns true. This is valid because of 
the branch B witnessing that G eventually decreases by at least a constant with probability 
1/2. Therefore, all conditions of the SM-Rule are satisfied and £ is AST. 


Theorem 8 (Correctness of Algorithm 3). Zf Algorithm 3 returns true on input L, then 
L with Gz satisfies the SM-Rule with constant d and p. 


The proof of Theorem 8, as well as of Theorem 9, are similar to the one of Theorem 7 and 
can be found in [40]. 

As established in Section 4, the relaxation of the R-AST-Rule requires that there is a 
positive probability of reaching the iteration 79 after which the conditions of the proof 
rule hold. Regarding automation, we strengthen this condition by ensuring that there is 
a positive probability of reaching any iteration, i.e. Vi € N: P(G;) > 0. Obviously, this im- 
plies P(G;, ) > 0. Furthermore, with CanReachAnylteration(L) we denote acomputable 
under-approximation of Vi € N : P(G;) > 0. That means, CanReachAnylteration(L) 
implies Vi € N: P(G;) > 0. Our approach proving non-AST is summarized in Algorithm 4. 


Example 11 (Algorithm 4). Let us illustrate Algorithm 4 for the Prob-solvable loop £ 


from Figure 2a: Applying Algorithm 4 on £ leads to E = a — 3 = a E 5 and to the 
upper bounding function u(i) = —1 for E on line 2. Therefore, the if-statement on line 3 


is not executed, which means —G is eventually a e-repulsing supermartingale. Moreover, 
with a simple static analysis of the loop, we establish CanReachAnylteration(L) to be 
true, as there is a positive probability that the loop guard does not decrease. Thus, the 
if-statement on line 4 is not executed. Also, the if-statement on line 6 is not executed, 
because e(i) = —u(i) = 1 is constant and therefore in 2(1). E eventually decreases by 
€= 1 (modulo a positive constant factor), because u(i) = —1 is an upper bounding function 
for E. We have differences = {1— # ,1 + # }. Both expressions in differences have an 
absolute bounding function of 1. Therefore, diffBounds = {1}. As a result on line 9 we 
have c(i) = 1, which eventually and almost surely is an upper bound on | — G;41+G;| 
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Algorithm 4: Repulsing-AST-Rule for proving non-AST 
Input: Prob-solvable loop £ 

Output: if true, L with —G satisfies the R-AST-Rule; hence £ is not AST 
E :=E(— Gi +G; | Fi) 

u(t) := ubf (E) 

if not eventually u(i) <0 then return false ; 

if — CanReachAnylteration(L) then return false ; 
e(i):=—u(i) 

if e(i)Z Q (1) then return false ; 

differences := {B+G | B € supp (UZC )} 

diffBounds := {abf (d) | d€ differences } 

c(i) := dominating (diffBounds) 

10 return c(i) € O(1) 


Cc oyu nan & BN eS 


(modulo a positive constant factor). Therefore, the algorithm returns true. This is correct, 
as all the preconditions of the R-AST-Rule are satisfied (and therefore £ is not AST). 


Theorem 9 (Correctness of Algorithm 4). If Algorithm 4 returns true on input L, then 
L with —Gc satisfies the R-AST-Rule. 


Because the R-PAST-Rule is a slight variation of the R-AST-Rule, Algorithm 4 can 
be slightly modified to yield a procedure for the R-PAST-Rule. An algorithm for the 
R-PAST-Rule is provided in [40]. 


5.4 Ruling out Proof Rules for Prob-Solvable Loops 


A question arising when combining our algorithmic approaches from Section 5.3 into a 
unifying framework is that, given a Prob-solvable loop £, what algorithm to apply first 
for determining £’s termination behavior? In [4] the authors provide an algorithm for 
computing an algebraically closed-form of E(M;), where M is a polynomial over £’s 
variables. The following lemma explains how the expression E(M;,1—M;) relates to the 
expression E(M;,,—M; | Fi). The lemma follows from the monotonicity of E 


Lemma 3 (Rule out Rules for L). Let (M; ien be a stochastic process. IfE(Mj+1—M; | 
Fi) <—e then E(Mj41—M;)<-—« for any cE RT. 


The contrapositive of Lemma 3 provides a criterion to rule out the viability of a given 
proof rule. For a Prob-solvable loop £, if E(G;,, —G;) £ 0 then E(G,1 —G;| F;) £0, 
meaning G is not a supermartingale. The expression E(G';,1 — G; ) depends only on 7 and 
can be computed by E(G;41—G;) =E(G;41) —E(G;), where the expected value E(G;) 
is computed as in [4]. Therefore, in some cases, proof rules can automatically be deemed 
nonviable, without the need to compute bounding functions. 


6 Implementation and Evaluation 


6.1 Implementation 


We implemented and combined our algorithmic approaches from Section 5 in the new 
software tool AMBER to stand for Asymptotic Martingale Bounds. AMBER and all bench- 
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marks are available at https://github.com/probing-lab/amber. AMBER uses MORA [4][6] 
for computing the first-order moments of program variables and the DIOFANT package> 
as its computer algebra system. 


Computing dominating and dominated The dominating and dominated procedures 
used in Algorithms | and 4 are implemented by combining standard algorithms for Big-O 
analysis and bookkeeping of the asymptotic polarity of the input functions. Let us illustrate 
this. Consider the following two input-output-pairs which our implementation would 
produce: (a) dominating ({i? + 10,10-i° —i3}) =7° and (b) dominating ({-i+50,—i8 + 
i? —3-73}) =—1i. For (a) 7° is eventually greater than all functions in the input set modulo a 
constant factor because all functions in the input set are O (ië). Therefore, ië dominates the 
input set. For (b), the first function is O (i) and the second is O (ië). In this case, however, 
both functions are eventually negative. Therefore, —7 is a function dominating the input 
set. Important is the fact that an exponential polynomial >? ,p; (i) ci, where c; ERẸ will 
always be eventually either only positive or only negative (or 0 if identical to 0). 


Sign Over-Approximation The over-approximation Sign() of the signs of a monomial 
x used in Algorithm 1 is implemented by a simple static analysis: For a monomial x 
consisting solely of even powers, Sign(a) = {+}. For a general monomial x, if zo > 0 and 
all monomials on which x depends, together with their associated coefficients are always 
positive, then — ¢ Sign(x). For example, if supp(UZ) = {x; + 2y; —32,,0; + ui}, then 
— ¢ Sign(a) if xo > 0 as well as — ¢ Sign(y), +Z Sign(z) and — ¢ Sign(u). Otherwise, 
— € Sign(a). The over-approximation for + ¢ Sign(«) is analogous. 


Reachability Under-Approximation CanReachAnylteration(L), used in Algorithm 4, 
needs to satisfy the property that if it returns true, then loop £ reaches any iteration 
with positive probability. In AMBER, we implement this under-approximation as follows: 
CanReachAnylteration(L) is true if there is a branch B of the loop guard polynomial Gc 
such that B—G‘z; is non-negative for all i € N. Otherwise, CanReachAnylteration(L) 
is false. In other words, if CanReachAnylteration(ZL) is true, then in any iteration there 
is a positive probability of Gz not decreasing. 


Bound Computation Improvements In addition to Algorithm 1 computing bounding func- 
tions for monomials of program variables, AMBER implements the following refinements: 


1. A monomial z is deterministic, which means it is independent of probabilistic choices, 
if x has a single branch and only depends on monomials having single branches. In 
this case, the exact value of x in any iteration is given by its first-order moments and 
bounding functions can be obtained by using these exact representations. 

2. Bounding functions for an odd power p of a monomial z can be computed by u(i)? 
and /(z)?, where u(i) is an upper- and l(i) a lower bounding function for x. 


Whenever the above enhancements are applicable, AMBER prefers them over Algorithm 1. 


5 https://github.com/diofant/diofant 
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6.2 Experimental Setting and Results 


Experimental Setting and Comparisons Regarding programs which are PAST, we com- 
pare AMBER against the tool ABSYNTH [42] and the tool in [10] which we refer to 
as MGEN. ABSYNTH uses a system of inference rules over the syntax of probabilistic 
programs to derive bounds on the expected resource consumption of a program and can, 
therefore, be used to certify PAST. In comparison to AMBER, ABSYNTH requires the de- 
gree of the bound to be provided upfront. Moreover, ABSYNTH cannot refute the existence 
of a bound and therefore cannot handle programs that are not PAST. MGEN uses linear 
programming to synthesize linear martingales and supermartingales for probabilistic 
transition systems with linear variable updates. To certify PAST, we extended MGEN [10] 
with the SMT solver Z3 [41] in order to find or refute the existence of conical combinations 
of the (super)martingales derived by MGEN which yield RSMs. 

With AMBER-LIGHT we refer to a variant of AMBER without the relaxations of the 
proof rules introduced in Section 4. That is, with AMBER-LIGHT the conditions of the 
proof rules need to hold for all į € N, whereas with AMBER the conditions are allowed to 
only hold eventually. For all benchmarks, we compare AMBER against AMBER-LIGHT to 
show the effectiveness of the respective relaxations. For each experimental table (Tables 1- 
3), Y symbolizes that the respective tool successfully certified PAST/AST/non-AST for 
the given program; X means it failed to certify PAST/AST/non-AST. Further, NA indicates 
the respective tool failed to certify PAST/AST/non-AST because the given program is 
out-of-scope of the tool’s capabilities. Every benchmark has been run on a machine with 
a 2.2 GHz Intel i7 (Gen 6) processor and 16 GB of RAM and finished within a timeout 
of 50 seconds, where most benchmarks terminated within a few seconds. 


Benchmarks We evaluated AMBER against 38 probabilistic programs. We present our ex- 
perimental results by separating our benchmarks within three categories: (i) 21 programs 
which are PAST (Table 1), (ii) 11 programs which are AST (Table 2) but not necessarily 
PAST, and (iii) 6 programs which are not AST (Table 3). The benchmarks have either been 
introduced in the literature on probabilistic programming [42, 10,4,22,38], are adaptations 
of well-known stochastic processes or have been designed specifically to test unique 
features of AMBER, like the ability to handle polynomial real arithmetic. 

The 21 PAST benchmarks consist of 10 programs representing the original bench- 
marks of MGEN [10] and ABSYNTH [42] augmented with 11 additional probabilistic 
programs. Not all benchmarks of MGEN and ABSYNTH could be used for our comparison 
as MGEN and ABSYNTH target related but different computation tasks than certifying 
PAST. Namely, MGEN aims to synthesize (super)martingales, but not ranking ones, 
whereas ABSYNTH focuses on computing bounds on the expected runtime. Therefore, 
we adopted all (50) benchmarks from [10] (11) and [42] (39) for which the termination 
behavior is non-trivial. A benchmark is trivial regarding PAST if either (i) there is no loop, 
(ii) the loop is bounded by a constant, or (iii) the program is meant to run forever. Moreover, 
we cleansed the benchmarks of programs for which the witness for PAST is just a trivial 
combination of witnesses for already included programs. For instance, the benchmarks 
of [42] contain multiple programs that are concatenated constant biased-random-walks. 
These are relevant benchmarks when evaluating ABSYNTH for discovering bounds, but 
would blur the picture when comparing against AMBER for PAST certification. With 
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=) z Ne] Ne) 
Program S S S y Program S ` $ y 
2d_bounded_random_walk v ov x NA exponential_past_2 v v NA NA 
‘biased_random_walk_constant v v 7 Pa p2 “geometric 7 E v A Z 7 g 
‘biased_random_walk_exp v v ~ x 7 we i “geometric_exponential x x x x i 
‘piased fandom: walk poly v x 7 x 5 T i Tinear past 1 E v F x x i 
binomial _past v ov E F e i Tineat past 2 v vo x "NA g 
‘complex_past y x - x J “NA i “nested_loops NA NA v 5 x : 
“consecutive_bernoulli_trails v v ~ y a A l ‘polynomial_past_1 E v x x "NA 3 
“coupon collector 4 v x 7 x 7 E ‘polynomial. past 2 v x x NA i 
‘coupon_collector_5 v x B x 7 ft : ‘Sequential loops 7 NA NA Vv x 
“dueling _cowboys v v ~ Vv = ff " ‘tortoise hare. race E Jv a r a i 
exponential past L MA NA Total / 8 12 8 9 


Table 1: 21 programs which are PAST. 


these criteria, 10 out of the 50 original benchmarks of [10] and [42] remain. We add 11 
additional benchmarks which have either been introduced in the literature on probabilistic 
programming [4,22,38], are adaptations of well-known stochastic processes or have been 
designed specifically to test unique features of AMBER. Notably, out of the 50 original 
benchmarks from [42] and [10], only 2 remain which are included in our benchmarks 
and which AMBER cannot prove PAST (because they are not Prob-solvable). All our 
benchmarks are available at https://github.com/probing-lab/amber. 


Experiments with PAST — Table 1: Out of the 21 PAST benchmarks, AMBER certifies 18 
programs. AMBER cannot handle the benchmarks nested_loops and sequential_loops, as 
these examples use nested or sequential loops and thus are not expressible as Prob-solvable 
loops. The benchmarks exponential_past_I and exponential_past_2 are out of scope of 
ABSYNTH because they require real numbers, while ABS YNTH can only handle integers. 
MGEN+Z3 cannot handle benchmarks containing non-linear variable updates or non- 
linear guards. Table 1 shows that AMBER outperforms both ABSYNTH and MGEN+Z3 for 
Prob-solvable loops, even when our relaxed proof rules from Section 4 are not used. Yet, 
our experiments show that our relaxed proof rules enable AMBER to certify 6 examples 
to be PAST, which could not be proved without these relaxations by AMBER-LIGHT. 


Experiments with AST — Table 2: We compare AMBER against AMBER-LIGHT on 
11 benchmarks which are AST but not necessarily PAST and also cannot be split into 
PAST subprograms. Therefore, the SM-Rule is needed to certify AST. To the best of 
our knowledge, AMBER is the first tool able to certify AST for such programs. Existing 
approaches like [1] and [14] can only witness AST for non-PAST programs, if - intuitively 
speaking - the programs contain subprograms which are PAST. Therefore, we compared 
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AMBER only against AMBER-LIGHT on this set of examples. The benchmark symmet- 
ric_2d_random_walk, which AMBER fails to certify as AST, models the symmetric ran- 
dom walk in R? andis still out of reach of current automation techniques. In [38] the authors 
mention that a closed-form expression M and functions p and d satisfying the conditions of 
the SM-Rule have not been discovered yet. The benchmark fair_in_limit_random_walk in- 
volves non-constant probabilities and can therefore not be modeled as a Prob-solvable loop. 


Experiments with non-AST — Table 3: We compare AMBER against AMBER-LIGHT on 
6 benchmarks which are not AST. To the best of our knowledge, AMBER is the first tool 
able to certify non-AST for such programs, and thus we compared AMBER only against 
AMBER-LIGHT. In [13], where the notion of repulsing supermartingales and the R-AST- 
Rule are introduced, the authors also propose automation techniques. However, the authors 
of [13] claim that their “experimental results are basic“ and their computational methods 
are evaluated on only 3 examples, without having any available tool support. For the bench- 
marks in Table 3, the outcomes of AMBER and AMBER-LIGHT coincide. The reason for 
this is R-AST-Rule’s condition that the martingale expression has to have c-bounded differ- 
ences. This condition forces a suitable martingale expression to be bounded by a linear func- 
tion, which is also the reason why AMBER cannot certify the benchmark polynomial_nast. 


Experimental Summary Our results from Tables 1-3 demonstrate that: 


— AMBER outperforms the state-of-the-art in automating PAST certification for Prob- 
solvable loops (Table 1). 

— Complex probabilistic programs which are AST and not PAST as well as programs 
which are not AST can automatically be certified as such by AMBER (Tables 2, 3). 

— The relaxations of the proof rules introduced in Section 4 are helpful in automating 
the termination analysis of probabilistic programs, as evidenced by the performance 
of AMBER against AMBER-LIGHT (Tables 1-3). 


7 Related Work 


Proof Rules for Probabilistic Termination Several proof rules have been proposed in the 
literature to provide sufficient conditions for the termination behavior of probabilistic 
programs. The work of [10] uses martingale theory to characterize positive almost sure 
termination (PAST). In particular, the notion of a ranking supermartingale (RSM) is intro- 
duced together with a proof rule (RSM-Rule) to certify PAST, as discussed in Section 3.1. 
The approach of [19] extended this method to include (demonic) non-determinism and 
continuous probability distributions, showing the completeness of the RSM-Rule for this 
program class. The compositional approach proposed in [19] was further strengthened 
in [29] to asound approach using the notion of descent supermartingale map. In [1], the 
authors introduced lexicographic RSMs. 

The SM-Rule discussed in Section 3.2 was introduced in [38]. Itis worth mentioning 
that this proof rule is also applicable to non-deterministic probabilistic programs. The work 
of [28] presented an independent proof rule based on supermartingales with lower bounds 
on conditional absolute differences. Both proof rules are based on supermartingales and 
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Program AMBER AMBER-LIGHT 

fair_in_limit_random_walk NA NA 

“gambling o B v p 7 i 

‘symmetric_2d_random_walk Ja x 2 x g 

‘symmetric_random_walk_constant_1_ v ~ / j 

‘symmetric_random_walk_constant_2. v a r ~ Program AMBER {sMBER- LIGHT 
‘symmetric_random_walk_exp_1 m y ~ x biased_random_walk_nast_1 v v 
‘symmetric_random_walk_exp_2 a v 7 x ‘biased_random_walk_nast_2. Pa E v ~ : 
‘symmetric_random_walk_linear_1 ~ ov E x biased. random; walk:nast 3 s 7 v Š 
‘symmetric_random_walk_linear_2 7 v f 7 7 ‘biased_random_walk_nast_4 Vv E v = i 
‘symmetric_random_walk_poly_1_ ~ v E x ‘binomial_nast 7 a x a v E i 
symmetric_random_walk_poly_2 7 Jv 7 x ‘polynomial_nast E E x 7 x 7 g 
Total ¥ 9 4 Total ⁄ 5 5 
Table 2: 11 programs which are AST and not Table 3: 6 programs which are not AST. 
necessarily PAST. 


can certify AST for programs that are not necessarily PAST. The approach of [43] exam- 
ined martingale-based techniques for obtaining bounds on reachability probabilities — and 
thus termination probabilities— from an order-theoretic viewpoint. The notions of nonneg- 
ative repulsing supermartingales and y-scaled submartingales, accompanied by sound and 
complete proof rules, have also been introduced. The R-AST-Rule from Section 3.3 was 
proposed in [13] mainly for obtaining bounds on the probability of stochastic invariants. 

An alternative approach is to exploit weakest precondition techniques for probabilistic 
programs, as presented in the seminal works [34,35] that can be used to certify AST. The 
work of [37] extended this approach to programs with non-determinism and provided 
several proof rules for termination. These techniques are purely syntax-based. In [31] a 
weakest precondition calculus for obtaining bounds on expected termination times was 
proposed. This calculus comes with proof rules to reason about loops. 


Automation of Martingale Techniques The work of [10] proposed an automated procedure 
— by using Farkas’ lemma — to synthesize linear (super)martingales for probabilistic 
programs with linear variable updates. This technique was considered in our experimental 
evaluation, cf. Section 6. The algorithmic construction of supermartingales was extended 
to treat (demonic) non-determinism in [12] and to polynomial supermartingales in [11] 
using semi-definite programming. The recent work of [14] uses w-regular decomposition 
to certify AST. They exploit so-called localized ranking supermartingales, which can be 
synthesized efficiently but must be linear. 


Other Approaches Abstract interpretation is used in [39] to prove the probabilistic ter- 
mination of programs for which the probability of taking a loop k times decreases at least 
exponentially with k. In [18], a sound and complete procedure deciding AST is given 
for probabilistic programs with a finite number of reachable states from any initial state. 
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The work of [42] gave an algorithmic approach based on potential functions for com- 
puting bounds on the expected resource consumption of probabilistic programs. In [36], 
model checking is exploited to automatically verify whether a parameterized family of 
probabilistic concurrent systems is AST. 

Finally, the class of Prob-solvable loops considered in this paper extends [4] to a wider 
class of loops. While [4] focused on computing statistical higher-order moments, our 
work addresses the termination behavior of probabilistic programs. The related approach 
of [22] computes exact expected runtimes of constant probability programs and provides 
a decision procedure for AST and PAST for such programs. Our programming model 
strictly generalizes the constant probability programs of [22], by supporting polynomial 
loop guards, updates and martingale expressions. 


8 Conclusion 


This paper reported on the automation of termination analysis of probabilistic while- 
programs whose guards and expressions are polynomial expressions. To this end, we 
introduced mild relaxations of existing proof rules for AST, PAST, and their negations, 
by requiring their sufficient conditions to hold only eventually. The key to our approach 
is that the structural constraints of Prob-solvable loops allow for automatically computing 
almost sure asymptotic bounds on polynomials over program variables. Prob-solvable 
loops cover a vast set of complex and relevant probabilistic processes including random 
walks and dynamic Bayesian networks [5]. Only two out of 50 benchmarks in [10,42] 
are outside the scope of Prob-solvable loops regarding PAST certification. The almost 
sure asymptotic bounds were used to formalize algorithmic approaches for proving AST, 
PAST, and their negations. Moreover, for Prob-solvable loops four different proof rules 
from the literature uniformly come together in our work. 

Our approach is implemented in the software tool AMBER (github.com/probing- 
lab/amber), offering a fully automated approach to probabilistic termination. Our ex- 
perimental results show that our relaxed proof rules enable proving probabilistic (non-) 
termination of more programs than could be treated before. A comparison to the state-of- 
art in automated analysis of probabilistic termination reveals that AMBER significantly 
outperforms related approaches. To the best of our knowledge, AMBER is the first tool 
to automate AST, PAST, non-AST and non-PAST in a single tool-chain. 

There are several directions for future work. These include extensions to Prob-solvable 
loops such as symbolic distributions, more complex control flow, and non-determinism. 
We will also consider program transformations that translate programs into our format. Ex- 
tensions of the SM-Rule algorithm with non-constant probability and decrease functions 
are also in our interest. 
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Abstract. We introduce Bayesian strategies, a new interpretation of 
probabilistic programs in game semantics. This interpretation can be 
seen as a refinement of Bayesian networks. 

Bayesian strategies are based on a new form of event structure, with two 
causal dependency relations respectively modelling control flow and data 
flow. This gives a graphical representation for probabilistic programs 
which resembles the concrete representations used in modern implemen- 
tations of probabilistic programming. 

From a theoretical viewpoint, Bayesian strategies provide a rich setting 
for denotational semantics. To demonstrate this we give a model for a 
general higher-order programming language with recursion, conditional 
statements, and primitives for sampling from continuous distributions 
and trace re-weighting. This is significant because Bayesian networks do 
not easily support higher-order functions or conditionals. 


1 Introduction 


One promise of probabilistic programming languages (PPLs) is to make Bayesian 
statistics accessible to anyone with a programming background. In a PPL, the 
programmer can express complex statistical models clearly and precisely, and 
they additionally gain access to the set of inference tools provided by the prob- 
abilistic programming system, which they can use for simulation, data analysis, 
etc. Such tools are usually designed so that the user does not require any in-depth 
knowledge of Bayesian inference algorithms. 

A challenge for language designers is to provide efficient inference algorithms. 
This can be intricate, because programs can be arbitrarily complex, and infer- 
ence requires a close interaction between the inference engine and the language 
interpreter [42, Ch.6]. In practice, many modern inference engines do not ma- 
nipulate the program syntax direcly but instead exploit some representation of 
it, more suited to the type of inference method at hand (Metropolis-Hastings 
(MH), Sequential Monte Carlo (SMC), Hamiltonian Monte Carlo, variational 
inference, etc.). 

While many authors have recently given proofs of correctness for inference 
algorithms (see for example [11,24,32]), most have focused on idealised descrip- 
tions of the algorithms, based on syntax or operational semantics, rather than on 
the concrete program representations used in practice. In this paper we instead 
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put forward a mathematical semantics for probabilistic programs designed to 
provide reasoning tools for existing implementations of inference. 

Our work targets a specific class of representations which we call data flow 
representations. We understand data flow as describing the dependence re- 
lationships between random variables of a program. This is in contrast with 
control flow, which describes in what order samples are performed. Such data 
flow representations are widely used in practice. We give a few examples. For 
Metropolis-Hastings inference, Church [30] and Venture [41] manipulate depen- 
dency graphs for random variables (“computation traces” or “probabilistic exe- 
cution traces”); Infer.NET [22] compiles programs to factor graphs in order to 
apply message passing algorithms; for a subset of well-behaved programs, Gen 
[23] statically constructs a representation based on certain combinators which 
is then exploited by a number of inference algorithms; and finally, for varia- 
tional inference, Pyro [9] and Edward [55] rely on data flow graphs for efficient 
computation of gradients by automatic differentiation. (Also [52,28].) 

In this paper, we make a step towards correctness of these implementations 
and introduce Bayesian strategies, a new representation based on Winskel’s 
event structures [46] which tracks both data flow and control flow. The Bayesian 
strategy corresponding to a program is obtained compositionally as is standard 
in concurrent game semantics [63], and provides an intensional foundation for 
probabilistic programs, complementary to existing approaches [24,57]. 

This paper was inspired by the pioneering work of Scibior et al. [53], which 
provides the first denotational analysis for concrete inference representations. In 
particular, their work provides a general framework for proving correct inference 
algorithms based on static representations. But the authors do not show how 
their framework can be used to accommodate data flow representations or verify 
any of the concrete implementations mentioned above. The work of this paper 
does not fill this gap, as we make no attempt to connect our semantic construc- 
tions with those of [53], or indeed to prove correct any inference algorithms. This 
could be difficult, because our presentation arises out of previous work on game 
semantics and thus does not immediately fit in with the monadic techniques 
employed in [53]. Nonetheless, efforts to construct game semantics monadically 
are underway [14], and it is hoped that the results presented here will set the 
ground for the development of event structure-based validation of inference. 


1.1 From Bayesian networks to Bayesian strategies 


Consider the following basic model, found in the Pyro tutorials (and also used 
in [39]), used to infer the weight of an object based on two noisy measurements. 
The measurements are represented by random variables meas, and meas2, whose 
values are drawn from a normal distribution around the true weight (weight), 
whose prior distribution is also normal, and centered at 2. (In this situation, 
meas, and meas are destined to be conditioned on actual observed values, and 
the problem is then to infer the posterior distribution of weight based on these 
observations. We leave out conditioning in this example and focus on the model 
specification.) 
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To describe this model it is convenient to use a Bayesian network, i.e. a DAG 
of random variables in which the distribution of each variable depends only on 
the value of its parents: 


N (2,1) 
weight : R 
N (weight, 0.1) ee wa N (weight, 0.1) 
meas, : R meas : R 


The same probabilistic model can be encoded in an ML-style language: 


let weight = sample „eigni normal(2,1) in 


sample normal(weight, 0.1); 


meas, 


sample normal (weight, 0.1); 


0 


Our choice of sampling meas, before meas is arbitrary: the same program with 
the second and third lines swapped corresponds to the same probabilistic model. 
This redundancy is unavoidable because programs are inherently sequential. It 
is the purpose of “commutative” semantics for probabilistic programs, as intro- 
duced by Staton et al. [54,57], to clarify this situation. They show that reordering 
program lines does not change the semantics, even in the presence of condition- 
ing. This result says that when specifying a probabilistic model, only data flow 
matters, and not control flow. This motivates the use of program representations 
based on data flow such as the examples listed above. 

In our game semantics, a probabilistic program is interpreted as a control 
flow graph annotated by a data dependency relation. The Bayesian strategy 
associated with the program above is as follows: 


meas2 


weight: R 
v BN Pewee 
meas, : R —) meas : R — (): 1: 


where (in brief), --+ is data flow, — is control flow, and the dashed node is the 
program output. (Probability distributions are as in the Bayesian network.) 

The semantics is not commutative, simply because reordering lines affects 
control flow; we emphasise that the point of this work is not to prove any new 
program equations, but instead to provide a formal framework for the represen- 
tations involved in practical inference settings. 


1.2 Our approach 


To formalise this idea we use event structures, which naturally model control 
flow, enriched with additional structure for probability and an explicit data 
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flow relation. Event structures were used in previous work by the author and 
Castellan on probabilistic programming [18], and were shown to be a good fit for 
reasoning about MH inference. But the representation in [18] combines data flow 
and control flow in a single transitive relation, and thus suffers from important 
limitations. The present paper is a significant improvement: by maintaining a 
clear separation between control flow and data flow, we can reframe the ideas in 
the well-established area of concurrent game semantics [63], which enables an 
interpretation of recursion and higher-order functions; these were not considered 
in [18]. Additionally, here we account for the fact that data flow in probabilistic 
programming is not in general a transitive relation. 

While there is some work in setting up the right notion of event structure, the 
standard methods of concurrent game semantics adapt well to this setting. This 
is not surprising, as event structures and games are known to be resistant to the 
addition of extra structure, see e.g. [21,5,15]. One difficulty is to correctly define 
composition, keeping track of potential hidden data dependencies. In summary: 


— We introduce a general notion of Bayesian event structure, modelling control 
flow, data flow, and probability. 

— We set up a compositional framework for these event structures based on con- 
current games. Specifically, we define a category BG of arenas and Bayesian 
strategies, and give a description of its abstract properties. 

— We give a denotational semantics for a higher-order statistical language. Our 
semantics gives an operationally intuitive representation for programs and 
their data flow structure, while only relying on standard mathematical tools. 


Paper outline. We start by recalling the basics of probability and Bayesian 
networks, and we then describe the syntax of our language (Sec. 2). In Sec. 3, 
we introduce event structures and Bayesian event structures, and informally 
describe our semantics using examples. In Sec. 4 we define our category of arenas 
and strategies, which we apply to the denotational semantics of the language in 
Sec. 5. We give some context and perspectives in Sec. 6. 


Acknowledgements. I am grateful to Simon Castellan, Mathieu Huot and Philip 
Saville for helpful comments on early versions of this paper. This work was 
supported by grants from EPSRC and the Royal Society. 


2 Probability distributions, Bayesian networks, and 
probabilistic programming 


2.1 Probability and measure 


We recall the basic notions, see e.g. [8] for a reference. 


Measures. A measurable space is a set X equipped with a o-algebra, that is, 
a set Xx of subsets of X containing X itself, and closed under completements 
and countable unions. The elements of Xx are called measurable subsets of 
X. An important example of measurable space is the set R equipped with its 
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o-algebra Xg of Borel sets, the smallest one containing all intervals. Another 
basic example is the discrete space N, in which all subsets are measurable. 

A measure on (X, Xx) is a function u : Xx —> [0,00] which is countably 
additive, i.e. u(W,<¢,Ui) = Jier Ui for I countable, and satisfies u(0) = 0. A 
fundamental example is the Lebesgue measure àA on R, defined on intervals as 
A([a, b]) = b — a and extended to all Borel sets. Another example (for arbitrary 
X) is the Dirac measure at a point x € X: for any U € Ly, 6,(U) = 1 if 
x € U, 0 otherwise. A sub-probability measure on (X, Xx) is a measure p 
satisfying u(X) < 1. 

A function f : X > Y is measurable if U € Xy => f-!U € Dx. Given 
a measure on a space X and a measurable function f : X — R, for every 
measurable subset U of X we can define the integral fe duf, an element of 
RU {oo}. This construction yields a measure on X. (Many well-known probability 
distributions on the reals arise in this way from their density.) 


Kernels. We will make extensive use of kernels, which can be seen as parametrised 
families of measures. Formally a kernel from X to Y isa map k : X x Xy > 
[(0,co] such that for every x € X, k(a,—) is a measure on Y, and for every 
V € Xy, k(-,V) is a measurable function. It is a sub-probability kernel if 
each k(x, —) is a sub-probability measure, and it is an s-finite kernel if it is a 
countable (pointwise) sum of sub-probability kernels. Every measurable function 
f : X + Y induces a Dirac kernel df : X ~ Y : x ++ 6¢(q). Kernels compose: 
ifk: X ~Y andh: Y ~ Z then the map hok: X x Xz — [0,1] defined as 
(x,W) => Jy dk(«,—)h(—, W) is also a kernel, and the Dirac kernel diq (often 
just ô) is an identity for this composition. We note that if both h and k are 
sub-probability kernels, then h o k is a sub-probability kernel. Finally, observe 
that a kernel 1 ~ X, for 1 a singleton space, is the same thing as a measure on 
X. 

In this paper we will refer to the bernoulli, normal, and uniform families 
of distributions; all of these are sub-probability kernels from their parameters 
spaces to N or R. For example, there is a kernel R? ~~ R : ((x,y), U) 6 
UN (æy) (U), where un(xy) is the measure associated with a normal distribution 
with parameters (x,y), if y > 0, and the 0 measure otherwise. We understand 
the bernoulli distribution as returning either 0 or 1 € N. 


Product spaces and independence. When several random quantities are under 
study one uses the notion of product space: given (X, Xx) and (Y, Xy) we 
can equip the set X x Y with the product o-algebra, written Xxxy, defined as 
the smallest one containing U x V, for U € Xx and V € Xy. 

A measure u on X x Y gives rise to marginals ux and uy, measures on X 
and Y respectively, defined by ux(U) = w(U x Y) and uy (V) = u(X x V) for 
U € Xx and V € Xy. 

Given kernels k : X ~ Y and h: Z ~ W we define the product kernel 
kxh:XxZ~~Y xW via iterated integration: 


(zU) f ak(e,—) J e O, 
yeY wEW 


524 H. Paquet 


where xy is the characteristic function of U € Xyxv. When X = Z = 1 this 
gives the notion of product measure. 

The definitions above extend with no difficulty to product spaces [],-; Xi. 
A measure P on [lier Xi has marginals Pz for any J C I, and we say that X; 
and X; are independent w.r.t. P if the marginal P; j is equal to the product 
measure P; x P;. 


2.2 Bayesian networks 


An efficient way to define measures on product spaces is using probabilistic 
graphical models [37], for example Bayesian networks, whose definition we briefly 
recall now. The idea is to use a graph structure to encode a set of independence 
constraints between the components of a product space. We recall the definition 
of conditional independence. With respect to a joint distribution P on ] [;ez Xi, 
we say X; and X; are conditionally independent given Xx if there exists a 
kernel k : Xk ~~ Xi x Xi such that P; jx (Ui x U; x Ux) = Ju, k(—, Ui x U;)dPi, 
for all measurable U;,U;,U,, and X; and X; are independent w.r.t. k(a,,,—) for 
all x, E Xx. In this definition, k is a conditional distribution of Xi x Xj; given 
Xp (w.r.t. P); under some reasonable conditions [8] this always exists, and the 
independence condition is the main requirement. 

Adapting the presentation used in [27], we define a Bayesian network as 
a directed acyclic graph Œ = (V,--») where each node v € V is assigned a 
measurable space M(v). We define the parents pa(v) of v to be the set of 
nodes u with u --> v, and its non-descendants nd(v) to contain the nodes u 
such that there is no path v --> --- --> u. Writing M(S) = J[ses M(v) for 
any subset S C V, a measure P on M(V) is said to be compatible with G 
if for every v € V, M(v) and M(nd(v)) are independent given M(pa(v)). It 
is straightforward to verify that given a Bayesian network G, we can construct 
a compatible measure by supplying for every v € V, an s-finite kernel k, : 
M(pa(v)) ~ M(v). 

(In practice, Bayesian networks are used to represent probabilistic models, 
and so typically every kernel k, is strictly probabilistic. Here the k, are only 
required to be s-finite, so they are in general unnormalised. As we will see, this 
is because we consider possibly conditioned models.) 

Bayesian networks are an elegant way of constructing models, but they are 
limited. We now present a programming language whose expressivity goes be- 
yond them. 


2.3 A language for probabilistic modelling 


Our language of study is a call-by-value statistical language with sums, products, 
and higher-order types, as well as recursive functions. Languages with compara- 
ble features are considered in [11,57,40]. 

The syntax of this language is described in Fig. 1. Note the distinction be- 
tween general terms M,N and values V. The language includes the usual term 
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constructors and pattern matching. Base types are the unit type, the real num- 
bers and the natural numbers, and for each of them there are associated con- 
stants. The language is parametrised by a set £ of labels, a set F of partial 
measurable functions R” > R or R” —> N, and a set D of standard distribution 
families, which are sub-probability kernels! R” ~~ R or R” ~~ N. There is also 
a primitive score which multiplies the weight of the current trace by the value 
of its argument. This is an idealised form of conditioning via soft constraints, 
which justifies the move from sub-probability to s-finite kernels (see [54]). 


A,B :=1|N|R|AxB|A+B|A>B 

V,W == (|n |r| £]|(V,W)|inlV |inrV |àz.M 

M,N :=V |x| MN |M = 0]|uz:A—> B.M | sample, dist(M,...,Mn) 
(M,N) | match M with (x,y) > P | score M 
inl M | inr M | match M with [inlz > Nj | inr z > No] 


Fig. 1: Syntax. 


reR CKEM:N CEM:R [I,c:A>BEM:A>5B 
rer:R TKM="0: IrscoreM:1 Tr Hur: A>BM:A>B 


(J:R”=>X)E€F (dist: R” >X)E€D Fori=1,...,n, P-M:i:R LEL 
r- f:R”>X I F- sample, dist(Mı,..., Mn) : X 


Fig. 2: Subset of typing rules. 


Terms of the language are typed in the standard way; in Fig. 2 we present 
a subset of the rules which could be considered non-standard. We use X to 
stand for either N or R, and we do not distinguish between the type and the 
corresponding measurable space. We also write B for 1 + 1, and use syntactic 
sugar for let-bindings, sequencing, and conditionals: 


let x: A= Min N := (àx: A.N)M 
M;N := letx:A=MinN (for x not free in N) 
if M then N; else Nọ := match M with [inlx > N; | inr z > No] 


3 Programs as event structures 


In this section, we introduce our causal approach. We give a series of examples 
illustrating how programs can be understood as graph-like structures known 
as event structures, of which we assume no prior knowledge. Event structures 
were introduced by Winskel et al. [46], though for the purposes of this work the 
traditional notion must be significantly enriched. 


1 Tn any practical instance of the language it would be expected that every kernel in 
D has a density in F, but this is not strictly necessary here. 
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let weight = sample,,,,,,,, normal(2,1) in 7 neg 

sample,,,,,, normal (weight, 0.1); ¥ i" -=-= 
f meas, : R c meas: : R (): 1; 

sample,,,,,,., noOrmal(weight, 0.1); () Deemed 


Fig. 3 


The examples which follow are designed to showcase the following features 
of the semantics: combination of data flow and control flow with probability 
(Sec. 3.1), conditional branching (Sec. 3.2), open programs with multiple argu- 
ments (Sec. 3.3) and finally higher-order programs (Sec. 3.4). We will then give 
further definitions in Sec. 3.5 and Sec. 3.6. 

Our presentation in Sec. 3.1 and Sec. 3.2 is intended to be informal; we give 
all the necessary definitions starting from Sec. 3.3. 


3.1 Control flow, data flow, and probability 


We briefly recall the example of the introduction; the program and its semantics 
are given in Fig. 3. As before, — represents control flow, and --> represents 
data flow. There is a node for each random choice in the program, and the 
dependency relationships are pictured using the appropriate arrows. Naturally, 
a data dependency imposes constraints on the control flow: every arrow --> 
must be realised by a control flow path —*. There is an additional node for the 
output value, drawn in a dashed box, which indicates that it is a possible point 
of interaction with other programs. This will be discussed in Sec. 3.3. 
Although this is not pictured in the above diagram, the semantics also com- 
prises a family of kernels, modelling the probabilistic execution according to 
the distributions specified by the program. Intuitively, each node has a distri- 
bution whose parameters are its parents for the relation --». For example, the 
node labelled measz will be assigned a kernel kmeas, : R ~~ R defined so that 
kmeas, (weight, —) is a normal distribution with parameters (weight, 0.1). 


3.2 Branching 


Consider a modified scenario in which only one measurement is performed, but 
with probability 0.01 an error occurs and the scales display a random number 
between 0 and 10. The corresponding program and its semantics are given in 
Fig. 4. 

In order to represent the conditional statement we have introduced a new 
element to the graph: a binary relation known as conflict, pictured ~~, and 
indicating that two nodes are incompatible and any execution of the program 
will only encounter one of them. Conflict is hereditary, in the sense that the 
respective futures of two nodes in conflict are also incompatible. Hence we need 
two copies of (); one for each branch of the conditional statement. Unsurprisingly, 
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weight : R 
7 


let weight = sample,,,;,,, normal(2,1) in 


i let error = sample bernoulli(0.01) in 


error 


| 

1 : ? 

1 if error = 0 
I 

I 


rO:1! J then sample,,,,,, uniform(0, 10) 
WJS Li 
E 


else sample„eas normal(weight, 0.1); () 


a 


Fig. 4 


beyond the branching point all events depend on error, since their very existence 
depends on its value. 

We continue our informal presentation with a description of the semantics 
of open terms. This will provide enough context to formally define the notion 


of event structure we use in this paper, which differs from others found in the 
literature. 


3.3 Programs with free variables 


We turn the example in Sec. 3.2 into one involving two free variables, guess and 
rate, used as parameters for the distributions of weight and error, respectively. 
These allow the same program to serve as a model for different situations. For- 
mally we have a term M such that guess: R, rate: Rt M : 1, given in Fig. 5 
with its semantics. We see that the two parameters are themselves represented 

guess: R 


rate: R; 
T F 


let weight = sample „eigni normal(guess, 1) in 


let error = sample bernoulli(rate) in 


error 


; ? 
if error =` 0 


then sample,,,, uniform(0, 10) 


else sample 


meas normal(weight, 0.1); () ee : ; 


Fig. 5 Liene 


by nodes, drawn in dotted boxes, showing that (like the output nodes) they are a 
point of interaction with the program’s external environment; this time, a value 
is received rather than sent. Below, we will distinguish between the different 
types of nodes by means of a polarity function. 

We attach to the parameter nodes the appropriate data dependency arrows. 
The subtlety here is with control flow: while is it clear that parameter values must 
be obtained before the start of the execution, and that necessarily guess — weight 


and rate — weight, it is less clear what relationship guess and rate should have 
with each other. 
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In a call-by-value language, we find that leaving program arguments causally 
independent (of each other) leads to soundness issues. But it would be equally 
unsound to impose a causal order between them. Therefore, we introduce a 
form of synchronisation relation, amounting to having both guess — rate and 
rate — guess, but we write guess - rate instead. In event structure terminol- 
ogy this is known as a coincidence, and was introduced by [19] to study the 
synchronous a-calculus. Note that in many approaches to call-by-value games 
(e.g. [31,26]) one would bundle both parameters into a single node representing 
the pair (guess, rate), but this is not suitable here since our data flow analysis 
requires separate nodes. 

We proceed to define event structures, combining the ingredients we have 
described so far: control dependency, data dependency, conflict, and coincidence, 
together with a polarity function, used implicitly above to distinguish between 
input nodes (—), output nodes (+), and internal random choices (0). 


Definition 1. An event structure E is a set E of events (or nodes) together 
with the following structure: 


— A control flow preorder < on E, and such that each event has a finite 
history: Ve € E, the set |e] := {e € E | e < e} is finite. This preorder 
is designed to be generated from the immediate dependency relation — 
and the coincidence relation —, which can both be recovered from <, 
as follows: we write e -- e’ when e and e' are equivalent in the preorder, 
ie.e < e ande < e; ande — e’ whenever the following holds: e < e', 
a(e’ >e), and ife <d< e then either d —e ord ~€';. 

— An irreflexive, binary conflict relation # on E, which is hereditary: if 
e < e ande #d then e # d. Observe that this applies when e — e’. The 
minimal conflict relation ~~ (typically used in diagrams) is defined as 
follows: e ~~ d if e # d, but for every dg < d and eg < e, —(e # dg) and 
=(eo # d). 

— An irreflexive, binary data flow relation --> on E, such that if e --> e 
thene < e and~(e —e'). Note that this is not required to be transitive. 

— A polarity function pol: E —> {+,0,—}, such that ife — e then pol(e) = 
pole’) #0. 

— A labelling function 1bl : Eo > £L, defined on the set Eo := {e € E | 
pol(e) = 0}. 


Often we write F instead of the whole tuple (E, <, #,-->, pol). It is sometimes 
useful to quotient out coincidences: we write E- for the set of —-equivalence 
classes, which we denote as boldface letters (e,a,s,...). It is easy to check that 
this is also an event structure with e < e’ (resp. #,--») if there is e € e and 
e' € e with e < e’ (resp. #,--+), and evident polarity function. 

We will see in Sec. 3.5 how this structure can be equipped with quantitative 
information (in the form of measurable spaces and kernels). Before discussing 
higher-order programs, we introduce the fundamental concept of configuration, 
which will play an essential role in the technical development of this paper. 
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Definition 2. A configuration of E is a finite subset x C E which is down- 
closed (ife < e and e E€ x then e € x) and conflict-free (if e, € x then 
a(e # e')). The set of all configurations of E is denoted @(E) and it is a 
partial order under C. 


We introduce some important terminology. For an event e € FE, we have de- 
fined its history |e] above. This is always a configuration of Æ, and the smallest 
one containing e. More generally we can define |e] = {e’ | Ve € e.e’ < e}, and 
[e) = [e] \e. 

The covering relation —C defines the smallest non-trivial extensions to a 
configuration; it is defined as follows: x—C y if there is e € E_. such that rne = 0 
and y = x Ue. We will sometimes write x —C° y. We sometimes annotate —C 
and C with the polarities of the added events: so for instance x C4 o y if each 
ei € y\ x has polarity + or 0. 


3.4 Higher-order programs 


We return to a fairly informal presentation; our goal now is to convey intuition 
about the representation of higher-order programs in the framework of event 
structures. We will see in Sec. 4 how this representation is obtained from the 
usual categorical approach to denotational semantics. 

Consider yet another faulty-scales scenario, in which the probability of error 
now depends on the object’s weight. Suppose that this dependency is not known 
by the program, and thus left as a parameter rate : R — R. The resulting 
program has type rate: R —> R, guess: R F R, as follows: 


let weight = sample,,,;,/,, normal(guess, 1) in 


let error = sample,,..,, bernoulli (rate weight) in error 


We give its semantics in Fig. 6. (To keep things simple this scenario involves no 
measurements.) 

It is an important feature of the se- 
mantics presented here that higher- 
order programs are interpreted as 
causal structures involving only val- 
ues of ground type. In the example, 
the argument rate is initially received 
not as a mathematical function, but 
as a single message of unit type (la- 
belled \"%“¢), which gives the program 
the possibility to call the function rate 
by feeding it an input value. Because 
the behaviour of rate is unknown, its 
output is treated as a new argument i 
to the program, represented by the Fig. 6 Ae 
negative out node. The shaded region sss =e 
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highlights the part of computation during which the program interacts with its 
argument rate. The semantics accommodates the possibiliy that rate itself has 
internal random choices; this will be accounted for in the compositional frame- 
work of Sec. 4. 


3.5 Bayesian event structures 
We show now that event structures admit a probabilistic enrichment.” 


Definition 3. A measurable event structure is an event structure together 
with the assignment of a measurable space M(e) for every event e € E. For any 


X CE we set M(X) = J] cx M(e). 


As is common in statistics, we often call e (or X) an element of M(e) (or 
M(X)). We now proceed to equip this with a kernel for each event. 


Definition 4. For E an event structure and e € E, we define the parents pa(e) 
ofe as {de E | d-->e}. 


Definition 5. A quantitative event structure is a measurable event struc- 
ture E with, for every non-negative e € E, a kernel ke : M(pa(e)) ~~ M(e). 


Our Bayesian event structures are quantitative event structures satisfying 
an additional axiom, which we introduce next. This axiom is necessary for a 
smooth combination of data flow and control flow; without it, the compositional 
framework of the next section is not possible. 


Definition 6. Let E be a quantitative event structure. We say thate € E is 
non-uniform if there are distinct pa(e), pa'(e) € M(pa(e)) such that 


ke(pa(e), M(e)) # ke(pa' (e), M(e)). 
We finally define: 


Definition 7. A Bayesian event structure is a quantitative event structure 
such that if e € E is non-uniform, and e < e' with e and e’ not coincident, then 
pa(e) C pa(e’). 

The purpose of this condition is to ensure that Bayesian event structures support 
a well-behaved notion of “hiding”, which we will define in the next section. 


3.6 Symmetry 


For higher-order programs, event structures in the sense of Definition 1 present 
a limitation. This has to do with the possibility for a program to call a function 
argument more than once, which the compositional framework of Sec. 4 does not 
readily support. We will use a linear logic-inspired “!” to duplicate nodes, thus 
making certain configurations available in infinitely many copies. The following 
additional structure, called symmetry, is there to enforce that these configura- 
tions yield equivalent behaviour. 


2 We emphasise that our notion of “event” is not related to the usual notion of event 
in probability theory. 
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Definition 8 (Winskel [61]). A symmetry on an event structure E is a fam- 
ily =p of bijections 0 : x y, with x,y € @(E), containing all identity bijections 
and closed under composition and inverses, satisfying the following axioms. 


I~ 


— For each 0 : x S y in Sp, if x C x' then there is a bijection 0’: x! S y' in 
Sp, such that @ C 6’. The analogous property is required for every restriction 
xv’ Ce. 

— Each 0 € Sp preserves polarity (pol(e) = pol(@(e))), data flow (e --> e => 
O(e) --> A(e')), and measurable structure (M(e) = M(@(e))). 


We write 0 : x ®p y if (0:22 y) € Sr. When E is Bayesian, we additionally 
require ke = kee) for every non-negative e € x. (This is well-defined because 0 
preserves data flow and thus pa(6(e)) = 0 pa(e).) 


Although symmetry can be mathematically subtle, combining it with addi- 
tional data on event structures does not usually pose any difficulty [15,48]. 

In this section we have described Bayesian event structures with symmetry, 
which are the basic mathematical objects we use to represent programs. A central 
contribution of this paper is to define a compositional semantics, in which the 
interpretation of a program is obtained from that of its sub-programs. This is 
the topic of the next section. 


4 Games and Bayesian strategies 


The presentation is based on game semantics, a line of research in the semantics 
of programming languages initiated in [3,33], though the subject has earlier roots 
in the semantics of linear logic proofs (e.g. [10]). 

It is typical of game semantics that programs are interpreted as concrete 
computational trees, and that higher-order terms are described in terms of the 
possible interactions with their arguments. As we have seen in the examples 
of the previous section, this interaction takes the form of an exchange of first- 
order values. The central technical achievement of game semantics is to provide 
a method for composing such representations. 

To the reader not familiar with game semantics, the terminology may be 
misleading: the work of this paper hardly retains any connection to game theory. 
In particular there is no notion of winning. The analogy may be understood 
as follows for a given program of type [+ M : A. There are two players: the 
program itself, and its environment. The “game”, which we study from the point 
of view of the program, takes place in the arena |I F A], which specifies which 
moves are allowed (calls to arguments in I’, internal samples, return values in 
A, etc.). The semantics of M is a strategy (written [M]), which specifies a 
plan of action for the program to follow in reaction to the moves played by the 
environment; this plan has to obey the constraints specified by the arena. 


4.1 An introduction to game semantics based on event structures 


There are many formulations of game semantics in the literature, with vary- 
ing advantages. This paper proposes to use concurrent games, based on event 
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structures, for reasoning about data flow in probabilistic programs. Originally 
introduced in [51] (though some important ideas appeared earlier: [25,44]), con- 
current games based on event structures have been extensively developed and 
have found a range of applications. 

In Sec. 2, we motivated our approach by assigning event structures to pro- 
grams; these event structures are examples of strategies, which we will shortly 
define. First we define arenas, which are the objects of the category we will 
eventually build. (The morphisms will be strategies.) 

Perhaps surprisingly, an arena is also defined as an event structure, though 
a much simpler one, with no probabilistic information, empty data dependency 
relation --», and no neutral polarity events. We call this a simple event struc- 
ture. This event structure does not itself represent any computation, but is sim- 
ply there to constrain the shape of strategies, just as types constrain programs. 
Before giving the definition, we present in Fig. 7 the arenas associated with the 
strategies in Sec. 3.3 and Sec. 3.4, stating which types they represent. Note the 
copy indices (0, 1, ...) in Fig. 7b; these point to duplicated (i.e. symmetric) 
branches. 


SSS SS 5 


(a) The arena [R, RF 1]. (b) The arena [R > R, RF R]. 


Fig. 7: Examples of arenas. 


Definition 9. An arena is a simple, measurable event structure with symmetry 


A = (A,®4), together with two sub-symmetries =} and =, subject to the 
following conditions: 


— A is a simple event structure which is alternating: if a — b then pol(a) # 
pol(b); forest-shaped: ifa < b andc <b thena <c orc<a (or both); and 
race-free: if a ~~ b then pol(a) = pol(b). 
=4, X7 and S$ satisfy the axioms of thin concurrent games /17, 3.17]. 

— Ifa,a’ are symmetric moves (i.e. there is 0 E€ S4 such that O(a) = a’) then 


M(a) = M(a’). 


Write init(A) for the set of initial events, i.e. those minimal for <. We say 
that A is positive if every a € init(A) is positive. (Negative arenas are defined 
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similarly.) We say that A is regular if whenever a,b € init(A), either a --b or 
aw b. 


So, arenas provide a set of moves together with certain constraints for playing 
those moves. Our definition of strategy is slightly technical, but the various 
conditions ensure that strategies can be composed soundly; we will explore this 
second point in Sec. 4.2. 

For a strategy S to be well-defined relative to an arena A, each positive or 
negative move of S must correspond to a move of A; however neutral moves of S' 
correspond to internal samples of the program; these should not be constrained 
by the type. Accordingly, a strategy comprises a partial map S — A defined 
precisely on the non-neutral events. The reader should be able to reconstruct 
this map for the examples of Sec. 3.3 and Sec. 3.4. 


Definition 10. A strategy on an arena A is a Bayesian event structure with 
symmetry S = (S,2s), together with a partial function o : S — A, whose 
domain of definition is exactly the subset {s € S | pol(s) # 0}, and such that 
whenever o(s) is defined, M(a(s)) = M(s) and pol(a(s)) = pol(s). This data is 
subject to the following additional conditions: 


(1) o preserves configurations: if x € @(S) then ox € @(A); and is locally 
injective: for s,s’ E€ x E €(S), if o(s) =a(s’) then s = s. 

(2) o is courteous: if s — s' in S and either pol(s) = + or pol(s') = —, then 
a(s) > a(s’). 

(3) o preserves symmetry (0: x sy = 00: 0% 4 oy), and it is S- 
receptive: if 0: x %s y and o0 -C_w E 4 then there exists a unique 
6’ € Ss such that 0 —C_ 6 and of’ = w ; and thin: if x € G(S) and 
id, —C+ o 0 for some 0 € Ss, then 0 = id, for some x’ € @(S). 

(4) Ifs --> s in S, then pol(s') 4 — and pol(s) 4+. 


Condition (1) amounts to ø being a map of event structures [60]. Combined with 
(2) and (3), we get the usual notion of a concurrent strategy on an arena with 
symmetry [17]; and finally (4) is a form of --»-courtesy. 

To these four conditions we add the following: 


Definition 11. A strategy S is innocent if conflict is local: s ~~ s' = [s) = 
[s’), and for every s E S, the following conditions hold: 


— (backwards sequentiality) the history |s] is a total preorder; and 
— (forward sequentiality) if [s]-Co', and [s]—Co"4 and sı # s2, then sı w sə. 


Innocence [33,56,16] prevents any non-local or concurrent behaviour. It is 
typically used to characterise “purely functional” sequential programs, i.e. those 
using no state or control features. Here, we use innocence as a way to confine 
ourselves to a simpler semantic universe. In particular we avoid the need to deal 
with the difficulties of combining concurrency and probability [62]. 

In the rest of the paper, a Bayesian strategy is an innocent strategy in the 
sense of Definition 10 and Definition 11. 
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4.2 Composition of strategies 


At this point, we have seen how to define arenas, and we have said that the 
event structures of Sec. 2 arise as strategies 0 : S — A for an arena A. As usual 
in denotational semantics, these will be obtained compositionally, by induction 
on the syntax. For this we must move to a categorical setting, in which arenas 
are objects and strategies are morphisms. 


Strategies as morphisms. Before we introduce the notion of strategy from A 
to B we must introduce some important construction on event structures. 


Definition 12. If A is an event structure, its dual A+ is the event structure 
whose structure is the same as A but for polarity, which is defined at pol, (a) = 
—pol,(a). (Negative moves become positive, and vice-versa, with neutral moves 
not affected.) For arenas, we define (A, 2a, 53, 24) = (At, Za, 2h, S7). 

Given a family (Aj)icr of event structures with symmetry, we define their 
parallel composition to have events ||,-; Ai = Ujer Ai x {i} with polarity, 
conflict and both kinds of dependency obtained componentwise. Noticing that a 
configuration x € C (|h;er Ai) corresponds to ||,-; x; where each x; € @(A;), and 
zi = I for all but finitely many i, we define the symmetry Sher 4: to contain 
bijections ||; 0i : ||, x: = ||, yi where each 0; € =a,. If the A; are arenas we define 
the two other symmetries in the same way. 


We can now define our morphisms: a strategy from A to B is a strategy 
on the arena At || B, i.e.a map o : S — At || B. The event structure S con- 
sists of A-moves (those mapped to the A+ component), B-moves, and internal 
(i.e. neutral) events. We sometimes write S: A+ B. 

The purpose of the composition operation © which we proceed to define is 
therefore to produce, from a pair of strategies o : S — At || B and 7 : T — B+ || 
C, a strategy TOO: TOS — At || C. A constant feature of denotational games 
models is that composition is defined in two steps: interaction, in which S and 
T synchronise by playing matching B-moves, and hiding, where the matching 
pairs of events are deleted. The setting of this paper allows both o and 7 to be 
partial maps, so that in general there can be neutral events in both S and 7; 
these never synchronise, and indeed they should not be hidden, since we aim to 
give an account of internal sampling. 

Before moving on to composition, a word of warning: the resulting structure 
will not be a category. Instead, arenas and strategies assemble into a weaker 
structure called a bicategory [6]. Bicategories have objects, morphisms, and 2- 
cells (morphisms between morphisms), and the associativity and identity laws 
are relaxed, and only need to hold up to isomorphisms. (This situation is rela- 
tively common for intensional models of non-determinism.) 


Definition 13. Two strategies o : S > A+ || B and o' : S! — A+ || B are 
isomorphic if there is a bijection f : S S S' preserving all structure, and such 
that for every x € @(S), the bijection with graph {(o(s),o’(f(s))) | s E€ x} is in 


wt 
=A" 
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Intuitively, S and S’ have the same moves up to the choice of copy indices. 
We know from [17] that isomorphism is preserved by composition (and all other 
constructions), so from now on we always consider strategies up to isomorphism; 
then we will get a category. 


Interaction. In what follows we assume fixed Bayesian innocent strategies S : 
A+ Band T : B + C as above, and study their interaction. We have hinted 
at the concept of “matching events” but the more convenient notion is that of 
matching configurations, which we define next. 


Definition 14. Configurations sg € @(S) and sr € G(T) are matching if 
there arexa E€ C(A) and xc E€ ©(C) such that ozs || tc = xa || Ter. 


There is an event structure with symmetry 7 ® S whose configurations cor- 
respond precisely to matching pairs; it is a well-known fact in game semantics 
that innocent strategies compose “like relations” [43,15]. Because “matching” B- 
moves have a different polarity in S and 7, there is an ambiguity in the polarity 
of some events in T ® S; we address this after the lemma. 


Lemma 1. Ignoring polarity, there is, up to isomorphism, a unique event struc- 
ture with symmetry T ® S, such that: 


— There is an order-isomorphism @(T ® S) = {(xs, sr) € @(S) x G(T) | 
xg and xr matching }. Write xr ® xg for the configuration corresponding 
to (wg, ur). 

— There are partial functions Hs :T@®S — S and Ip :T®S —T, such that 
for every rr ® zs E S(T ® S), Is(ar ® xg) = xg and Is(ar ® xg) = zr. 

— For everye,e’ E T®S, e — e' iff either Hs(e) + Is(e’) or Hr(e) > Hr(e’), 
and the same property holds for the conflict and data dependency relations. 

— IIs and Ir preserve and reflect labels. 

— A bijection 0 : rr ® xs = yr ® yg is in Sres if both pO: sr =r yr and 
ITs@: £s =s ys. 


Furthermore, for every e ET ® S, at least one of I[s(e) and Hr(e) is defined. 


When reasoning about the polarity of events in T ® S, a subtlety arises 
because B-moves are not assigned the same polarity in S and 7. This is not sur- 
prising: polarity is there precisely to allow strategies to communicate by sending 
(+) and receiving (—) values; in this interaction, S and T play complementary 
roles. To reason about the flow of information in the event structure T ® S it 
will be important, for each B-move e of T ® S, to know whether it is positive 
in S or in 7; in other words, whether information is flowing from S to T, or 
vice-versa. 

Accordingly, we define pol® : T ® S > {+°,+7,0°,07,—}, as follows: 


+5 (resp. 05) if Hs(e) is defined and pol(Hs(e)) = + (resp. 0) 
pol®(e) = < +7 (resp. 07) if IZp(e) is defined and pol(IIp(e)) = + (resp. 0), 
= otherwise. 
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Probability in the interaction. Unlike with polarity, S and T agree on what 
measurable space to assign to each B-move, since by the conditions on strategies, 
this is determined by the arena. So for each e € T ® S we can set M(e) = 
M(IIs(e)) or M(Ir(e)), unambiguously, and an easy argument shows that this 
makes 7 ® S a well-defined measurable event structure with symmetry. 

We can turn T ® S into a quantitative event structure by defining a kernel 
ke : M(pa®(e)) ~~ M(e) for every e € T ® S such that pol®(e) 4 —. The 
key observation is that when pol® (e) € {+5,05}, the parents of e correspond 
precisely to the parents of Js(e) in S. Since Js preserves the measurable space 
associated to an event, we may then take k? = ky,(e). 


Hiding. Hiding is the process of deleting the B-moves from T ® S, yielding a 
strategy from A to C. The B-moves are exactly those on which both projections 
are defined, so the new set of events is obtained as follows: 


TOS={eET@S | Ms(e) and Hr(e) are not both defined}. 


This set inherits a preorder <, conflict relation #, and measurable structure 
directly from T ® S. Polarity is lifted from either S or T via the projections. 
(Note that by removing the B-moves we resolved the mismatch.) To define the 
data flow dependency, we must take care to ensure that the resulting T © S is 
Bayesian. For e,e’ € T © S, we say e --> e’ if one of the following holds: 


(1) There exist n > 0 and e1,...,€n € T ® S, all B-moves, such that e --> 
e1 --> +++ --% en --> ef (in T® 8). 

(2) There exist a non-uniform d € T ® S, n > 0 and e1,...,e, E T@S, all 
B-moves, such that such that e --> eq --+ -++ --> e, --> d and d < e’. 


From a configuration x € € (T © S) we can recover the hidden moves to get 
an interaction witness T = {e © T®S |e < e' € x}, a configuration of 
@(T ® S). For z,y € @(T © S), a bijection 6: « S y is in Sros if there is 
0: %=res Y which restricts to 0. This gives a measurable event structure with 
symmetry 7 ©S. 

To make TOS a Bayesian event structure, we must define for every e € TOS 
a kernel ke, which we denote kÊ to emphasise the difference with the kernel k® 
defined above. Indeed the parents pa®(e) of e in T ® S may no longer exist in 
T © S, where e has a different set of parents pa® (e). 

We therefore consider the subset of hidden ancestors of e which ought to 
affect the kernel kÊ: 


Definition 15. For strategies S : A+ B and T : B >C, ande€ TOS, an 
essential hidden ancestor of e is a B-move dE T ® S, such that d < e and 
one of the following holds: 


(1) There are e € pa®(e),e2 € pa®(e) such that e1 --> ++» --> d --> +++ =--> e2. 
(2) There are eo € pa® (e), B-moves d' and e1,...,en, with d! non-uniform, 
such that eo --> e1 --> <> --> ej --> d --> ej41 =--> t --> En -> d'. 
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Since 7 © S is innocent, e has a sequential history, and thus the set of essential 
hidden ancestors of e forms a finite, total preorder, for which there exists a linear 
enumeration dı <--- < dn. We then define k? : M(pa(e)) ~> M(e) as follows: 


KE (pale), U) = | kpa? (dr), dd) > f (pa® (dn), dd) [KÊ (Da® (0), U) 
do qd, o 
where we abuse notation: using that for every i < n, pa®(d;) C pa® (e) U {d; | 
j < i}, we may write pa® (d;) for the only element of M(pa®(d;)) compatible 
with pa®(e) and d,,...,d;_,. The particular choice of linear enumeration does 
not matter by Fubini’s theorem for s-finite kernels. 


Lemma 2. There is amapT@Qa:TOS = At || C making TOS a Bayesian 
strategy. We call this the composition of S and T. 


Copycat. We have defined morphisms between arenas, and how they compose. 
We now define identities, called copycat strategies. In the semantics of our lan- 
guage, these are used to interpret typing judgements of the form 1: AF a: A, 
and the copycat acts by forwarding values received on one side across to the 
other. To guide the intuition, the copycat strategy for the game [R] — [R] is 
pictured in Fig. 8. (We will define the — construction later.) 


= te 


i Asni AT Acil 
p= VELA See Dae Ses a 
in: R in: RS in: R 
NS eee o. ae rae Se, oa eee 
Lout: Ry Cu out: Ry 
(a) (b) 


Fig. 8: The arena [R] — [R] (a), and the copycat strategy on it (b). 


Formally, the copycat strategy on an arena A is a Bayesian event structure 
(with symmetry) (4, together with a (total) map œa : Œa > At || A. As 
should be clear in the example of Fig. 8, the events, polarity, conflict, and measur- 
able structure of Œ 4 are those of At || A. The order < is the transitive closure 
of that in A+ || A enriched with the pairs {((a,1), (a,2)) | a € A and pol,(a) = 
+} U {((a, 2), (a,1)) | pola (a) = —}. The same sets of pairs also make up the 
data dependency relation in @y; recall that there is no data dependency in the 
event structure A. Note that because (4 is just At || A with added constraints, 
configurations of ©, can be seen as a subset of those of At || A, and thus the 
symmetry =@, is inherited from = 41) 4. 
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To make copycat a Bayesian strategy, we observe that for every positive e € 
Œa, pale) contains a single element, the correponding negative move in At || A, 
which carries the same measurable space. Naturally, we take ke : M(e) ~~ M(e) 
to be the identity kernel. 

We have defined objects, morphisms, composition, and identities. They as- 
semble into a category. 


Theorem 1. Arenas and Bayesian strategies, with the latter considered up to 
isomorphism, form a category BG. BG has a subcategory BG* whose ob- 
jects are positive, regular arenas and whose morphisms are negative strategies 
(i.e. strategies whose inital moves are negative), up to isomorphism. 


The restriction implies (using receptivity) that for every strategy A + B in 
BG, initial moves of S correspond to init(A). This reflects the dynamics of a 
call-by-value language, where arguments are received before anything else. We 
now set out to define the semantics of our language in BG*. 


5 A denotational model 


In Sec. 5.1, we describe some abstract constructions in the category, which pro- 
vide the necessary ingredients for interpreting types and terms in Sec. 5.2. 


5.1 Categorical structure 


The structure required to model a calculus of this kind is fairly standard. The 
first games model for a call-by-value language was given by Honda and Yoshida 
[31] (see also [4]). Their construction was re-enacted in the context of concurrent 
games by Clairambault et al. [20], from whom we draw inspiration. The adapta- 
tion is not however automatic as we must account for measurability, probability, 
data flow, and an interpretation of product types based on coincidences. 


Coproducts. Given arenas A and B, their sum A+ B has events those of A || 
B, and inherited polarity, preorder, and measurable structure, but the conflict 
relation is extended so that a # b for every a € A and b € B. The symmetries 
4+8, “4+g and Se are restricted from = 4)/z, =AIB and Sije 

The arena A+ B is a coproduct of A and B in BG™. This means that there 
are injections t4 : A+ A+B andig: B+ A+B behaving as copycat on the 
appropriate component, and that any two strategies 0: A » C and T: B »C 
induce a unique co-pairing strategy denoted |o, T] : A+B + C. This construction 
can be performed for any arity, giving coproducts jey Ai. 


Tensor. Tensor products are more subtle, partly because in this paper we use 
coincidence to deal with pairs, as motivated in Sec. 3.3. For example, given two 
arenas each having a single initial move, we construct their tensor product by 
taking their parallel composition and making the two initial moves coincident. 
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Fig. 9: Example of tensor construction. 


More generally, suppose A and 6 are arenas in which all inital events are coinci- 
dent; we call these elementary arenas. Then A® B has all structure inherited 
from A || B, and additionally we set a — b for every a € init(A) and b € init(B). 
Since @(A Q B) C @(A || B), we can define symmetries on A @ B by restricting 
those in A || B. 

Now, because arenas in BG* are regular (Definition 9), it is easy to see that 
each A is isomorphic to a sum J;e; A; with each A; elementary. If B € BG* is 
isomorphic to )>, jeJ B; with the B; elementary, we define AQ B = >), ; Ai @B;. 

In order to give semantics to pairs of terms, we must define the action of Q 
of strategies. Consider two strategies o : S — At | A’ and r: T — Bt || B’. 
Let o || 7: S || T — (A || B)+ || (4 || B’) be defined in the obvious way from 
o and T (note the codomain was rearranged). We observe that @((A @ B)+ || 
(A B) EEA || B)+ || (A || BY) and show: 


Lemma 3. Up to symmetry, there is a unique event structure S QT such that 
€(S@T)={xE C(S || T) | (o || rT) zx E G((A@B)+ || (A’@B’))} and such that 
polarity, labelling, and data flow are lifted from S || T via a projection function 
SeT3 S| T. 


Informally, the strategies synchronise at the start, 7.e. all initial moves are re- 
ceived at the same time, and they synchronise again when they are both ready 
to move to the A’ & B’ side for the first time. 

The operations — @ B and A & — on BG” define functors. However, as is 
typically the case for models of call-by-value, the tensor fails to be bifunctorial, 
and thus BG* is not monoidal but only premonoidal [50]. The unit for & is the 
arena 1 with one (positive) event () : 1. There are “copycat-like” associativity, 
unit and braiding strategies, which we omit. 

The failure of bifunctoriality in this setting means that for o : A + A’ and 
7: B+ B’, the strategy S @ T is in general distinct from the following two 
strategies: 


SQT =(Cy @T)O(S@Cz) S$ BT =(S ® Op) O(Cy@T) 


See Fig. 9 for an example of the ® and ©; constructions on simple strategies. 
Observe that the data flow relation is not affected by the choice of tensor: this is 
related to our discussion of commutativity in Sec. 1.1: a commutative semantics 
is one that satisfies @; = ®, = ® 
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We will make use of the left tensor &; in our denotational semantics, because 

it reflects a left-to-right evaluation strategy, which is standard. It will also be 
important that the interpretation of values lies in the centre of the premonoidal 
category, which consists of those strategies S for which S @; T = S @, T and 
T 81S = T ®,S for every T. Finally we note that & distributes over +, in the 
sense that for every A, B,C the canonical strategy (A®B)+(A@C) + A@(B+C) 
has an inverse À. 
Function spaces. We now investigate the construction of arenas of the form 
A — B. This is a linear function space construction, allowing at most one call 
to the argument A; in Sec. 5.1 we will construct an extended arena !(A — B) 
permitting arbitrary usage. Given A and B we construct A — B as follows. (This 
construction is the same as in other call-by-value game semantics, e.g. [31,20].) 
Recall that we can write A = J;e A; with each A; an elementary arena. Then, 
A — B has the same set of events as 1 || )>,-;(Ap || B), with inherited polarity 
and measurable structure, but with a preorder enriched with the pairs {(A, a) | 
a € init(A)} U {(ai, (i,b)) | a € init(A;), b € init(B)}, where in this case we call 
A the unique move of 1. 

For every strategy o : A 8 B >» C we call A(c) : A + B — C the strategy 
which, upon receiving an opening A-move (or coincidence) a, deterministically 
(and with no data-flow link) plays the move A in B —o C, waits for Opponent 
to play a B-move (or coincidence) b and continues as ø would on input a —— b. 
Additionally there is for every B and C an evaluation morphism evg.c : (B — 
C) 8 B +C defined as in [20]. 


Lemma 4. For a strategy o : AQ B + C, the strategy A(o) is central and 
satisfies ev © (A(o) 8 œ) =a. 


Duplication. We define, for every arena A, a “reusable” arena !A. Its precise 
purpose will become clear when we define the semantics of our language. It is 
helpful to start with the observation that ground type values are readily duplica- 
ble, in the sense that there is a strategy [R] + [R] 8 [R] in BG. Therefore ! will 
have no effect on [R], but only on more sophisticated arenas (e.g. [R] — [R]) for 
which no such (well-behaved) map exists. We start by studying negative arenas. 


Definition 16. Let A be a negative arena. We define !A to be the measurable 
event structure !A =||;e~ A, equipped with the following symmetries: 


— “4 contains those 0 :||icu zi Sllicw yi for which there is 7: w S w and 
6; : xi =A, Lyx) Such that O(a, i) = (0;(a),7(2)) for each (a,i) € 1A. 

— =, contains bijections 0: x=, y such that for each i € w, 6; : Li =A Yr(i)- 
= contains bijections 0 : x S4 y s.t. t =id and for each i, 6; : x; =} Yi- 

It can be shown that !A is a well-defined negative arena, i.e. meets the condi- 
tions of Definition 9. Observe that an elementary positive arena B corresponds 
precisely to a set e of coincident positive events, all initial for >, immediately 
followed by a negative arena which we call B_. Followed here means that e < b 
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Fig. 10: Constant strategies. (The copy indices i in f indicate that we have w 
symmetric branches.) 


for alle € e and b € B_, and we write B = e- B_. We define !B = e-!B_. Finally, 
recall that an arbitrary positive arena B can be written as a sum of elementary 
ones: B = J;e Bi- We then define !B = J ez Bi. 

For positive A and B, a central strategy o : A + B induces a strategy 
lo : IA => !B, and this is functorial. The functor ! extends to a linear exponential 
comonad on the category with elementary arenas as objects and central strategies 
as morphisms (see [20] for the details of a similar construction). 


Recursion. To interpret fixed points, we consider an ordering relation on strate- 
gies. We momentarily break our habit of considering strategies up to isomor- 
phism, as in this instance it becomes technically inconvenient [17]. 


Definition 17. Ifo: S — A andt:7T — A are strategies, we write SE T if 
S CT, the inclusion map is a map of event structures, preserves all structure, 
including kernels, and for every s € S, o(s) =7(s). 


Lemma 5. Every w-chain So E S1 E ... has a least upper bound V 
by the union U 


icw Si, given 
Si, with all structure obtained by componentwise union. 


iEw 


There is also a least strategy -L on every arena, unique up to isomorphism. We 
are now ready to give the semantics of our language. 


5.2 Denotational semantics 


The interpretation of types is as follows: 
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Fig. 11: Interpretation of terms as strategies. 


[A + B] = [A] + [B] [A x B] = [A] s [5] [A> 5] = :([A] — [B]) 


This interpretation extends to contexts via [-] = 1 and [a : A1,..., £n : An] = 
[Ai] 8...8 JAn]. (In Fig. 7 we used [I + A] to refer to the arena [T]+ || [A]-) 
A term + M : A is interpreted as a strategy [M]/ : [T] — [A], defined 
inductively. For every type A, the arena [A] is both a !-coalgebra and a commu- 
tative comonoid, so there are strategies w4 : [A] + 1, c4 : [A] > [A] @[A], and 
ha : [A] + !A]. Using that the comonad ! is monoidal, this structure extends 
to contexts; we write cr, wr and Ar for the induced maps. The interpretation of 
constants is shown in Fig. 10, and the rest of the semantics is given in Fig. 11. 


Lemma 6. For a value IH V : A, the strategy [V] is central. 


The semantics is sound for the usual call-by-value equations. 


Proposition 1. For arbitrary terms M, P, Nı, Nə and values V,W, 


[(Av-M) V]" = [M[V/2]]" 

[match (V, W) with (x,y) > P]’ = [P[V/2][W/y\)" 

[match inlV with [inle > N, | inre > NojJ/ = [Ni [V/a]]”. 
The equations are directly verified. Standard reasoning principles apply given 
the categorical structure we have outlined above. (It is well known that pre- 


monoidal categories provide models for call-by-value [50], and our interpretation 
is a version of Girard’s translation of call-by-value into linear logic [29].) 
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6 Conclusion and perspectives 


We have defined, for every term "+ M : A, a strategy [M]/. This gives a model 
for probabilistic programming which provides an explicit representation of data 
flow. In particular, if / M : 1, and M has no subterm of type B + C, then the 
Bayesian strategy |M] is a Bayesian network equipped with a total ordering 
of its nodes: the control flow relation <. Our proposed compositional semantics 
additionally supports sum types, higher types, and open terms. 

This paper does not contain an adequacy result, largely for lack of space: 
the ‘Monte Carlo’ operational semantics of probabilistic programs is difficult to 
define in full rigour. In further work I hope to address this and carry out the 
integration of causal models into the framework of [53]. The objective remains 
to obtain proofs of correctness for existing and new inference algorithms. 


Related work on denotational semantics. Our representation of data flow based 
on coincidences and a relation --> is novel, but the underlying machinery relies 
on existing work in concurrent game semantics, in particular the framework of 
games with symmetry developed by Castellan et al. [17]. This was applied to 
a language with discrete probability in [15], and to a call-by-name and affine 
language with continuous probability in [49]. This paper is the first instance 
of a concurrent games model for a higher-order language with recursion and 
continuous probability, and the first to track internal sampling and data flow. 

There are other interactive models for statistical languages, e.g. by Ong and 
Vakar [47] and Dal Lago et al. [38]. Their objectives are different: they do not 
address data flow (i.e. their semantics only represents the control flow), and do 
not record internal samples. 

Prior to the development of probabilistic concurrent games, probabilistic no- 
tions of event structures were considered by several authors (see [58,1,59]). The 
literature on probabilistic Petri nets important related work, as Petri nets can 
sometimes provide finite representations for infinite event structures. Markov 
nets [7,2] satisfy conditional independence conditions based on the causal struc- 
ture of Petri nets. More recently Bruni et al. [12,13] relate a form of Petri nets 
to Bayesian networks and inference, though their probability spaces are discrete. 


Related work on graphical representations. Our event structures are reminiscent 
of Jeffrey’s graphical language for premonoidal categories [35], which combines 
string diagrams [36] with a control flow relation. Note that in event structures 
the conflict relation provides a model for sum types, which is difficult to obtain 
in Jeffrey’s setting. The problem of representing sum types arises also in proba- 
bilistic modelling, because Bayesian networks do not support them: [45] propose 
an extended graphical language, which could serve to interpret first-order proba- 
bilistic programs with conditionals. Another approach is by [42], whose Bayesian 
networks have edges labelled by predicates describing the branching condition. 
Finally, the theory of Bayesian networks has also been investigated extensively 
by Jacobs [34] with a categorical viewpoint. It will be important to understand 
the formal connections between our work and the above. 
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Abstract. We propose a logic for temporal properties of higher-order 
programs that handle infinite objects like streams or infinite trees, rep- 
resented via coinductive types. Specifications of programs use safety and 
liveness properties. Programs can then be proven to satisfy their specifi- 
cation in a compositional way, our logic being based on a type system. 
The logic is presented as a refinement type system over the guarded 
A-calculus, a A-calculus with guarded recursive types. The refinements 
are formulae of a modal -calculus which embeds usual temporal modal 
logics such as LTL and CTL. The semantics of our system is given within 
a rich structure, the topos of trees, in which we build a realizability model 
of the temporal refinement type system. 


Keywords: coinductive types, guarded recursive types, p-calculus, re- 
finement types, topos of trees. 


1 Introduction 


Functional programming is by now well established to handle infinite data, 
thanks to declarative definitions and equational reasoning on high-level abstrac- 
tions, in particular when infinite objects are represented with coinductive types. 
In such settings, programs in general do not terminate, but are expected to com- 
pute a part of their output in finite time. For example, a program expected to 
generate a stream should produce the next element in finite time: it is productive. 

Our goal is to prove input-output temporal properties of higher-order pro- 
grams that handle coinductive types. Logics like LTL, CTL or the modal p- 
calculus are widely used to formulate, on infinite objects, safety and liveness 
properties. Safety properties state that some “bad” event will not occur, while 
liveness properties specify that “something good” will happen (see e.g. [9]). Typ- 
ically, modalities like 0 (always) or © (eventually) are used to write properties 
of streams or infinite trees and specifications of programs over such data. 

We consider temporal refinement types {A | y}, where A is a standard type 
of our programming language, and y is a formula of the modal p-calculus. Using 
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refinement types [22], temporal connectives are not reflected in the programming 
language, and programs are formally independent from the shape of their tem- 
poral specifications. One can thus give different refinement types to the same 
program. For example, the following two types can be given to the same map 
function on streams: 


mapt H te ey) = re | One rete Eole g 
map: ({B | Y} > {4 | e}) — {Str B | OOfhd]y} — {Str A | OO[hd]p} 


These types mean that given f : B + A s.t. f(b) satisfies ọ if b satisfies 7, 
the function (map f) takes a stream with infinitely many (resp. ultimately all) 
elements satisfying w to one with infinitely many (resp. ultimately all) elements 
satisfying y. For y a formula over A, [hd]y is a formula over streams of A’s which 
holds on a given stream if y holds on its head element. 

It is undecidable whether a given higher-order program satisfies a given input- 
output temporal property written with formulae of the modal p-calculus [$1]. 
Having a type system is a partial workaround to this obstacle, which moreover 
enables to reason compositionally on programs, by decomposing a specification 
to the various components of a program in order to prove its global specification. 

Our system is built on top of the guarded )-calculus [18], a higher-order pro- 
gramming language with guarded recursion [52]. Guarded recursion is a simple 
device to control and reason about unfoldings of fixpoints. It can represent coin- 
ductive types and provides a syntactic compositional productivity check [5]. 

Safety properties (e.g. O[hd]y) can be correctly represented with guarded fix- 
points, but not liveness properties (e.g. O[hd]y, OOl[hd]y, OO[hd]~). Combining 
liveness with guarded recursion is a challenging problem since guarded fixpoints 
tend to have unique solutions. Existing approaches to handle temporal types in 
presence of guarded recursion face similar difficulties. Functional reactive pro- 
gramming (FRP) [2I] provides a Curry-Howard correspondence for temporal 
logics in which logical connectives are reflected as programming con- 
structs. When combining FRP with guarded recursion [44]7], and in particular 
to handle liveness properties [8], uniqueness of guarded fixpoints is tempered by 
specific recursors for temporal types. 

Our approach is different from [8], as we wish as much as possible the logi- 
cal level not to impact the program level. We propose a two level system, with 
the lower or internal level, which interacts with guarded recursion and at which 
only safety properties are correctly represented, and the higher or external one, 
at which liveness properties are correctly handled, but without direct access to 
guarded recursion. By restricting to the alternation-free modal j-calculus, in 
which fixpoints can always be computed in w-steps, one can syntactically reason 
on finite unfoldings of liveness properties, thus allowing for crossing down the 
safety barrier. Soundness is proved by a realizability interpretation based on the 
semantics of guarded recursion in the topos of trees [13], which correctly repre- 
sents the usual set-theoretic final coalgebras of polynomial coinductive types [50]. 

We provide example programs involving linear structures (colists, streams, 
fair streams [I7J8]) and branching structures (resumptions à la [44]), for which 
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Cons® := Aw.As. fold((x, s)) : A > » Str® A > Str? A 
hd® := As.mo(unfold s) : Str? A > A 
tl® := As.mi (unfold s) : Strë A > » Str? A 
map® := Af.fix(g).As. Cons® (f(hd® s)) (g @ (tl® s)) : (B > A) > Str? B > Str? A 


Fig. 1. Constructor, Destructors and Map on Guarded Streams. 


we prove liveness properties similar to above. Our system also handles safety 
properties on breadth-first (infinite) tree traversals à la [35] and [I0]. 


Organization of the paper. We give an overview of our approach in 
Then 43] presents the syntax of the guarded A-calculus. Our base temporal logic 
(without liveness) is introduced in §4} and is used to define our refinement type 
system in Liveness properties are handled in {6| The semantics is given in 
and 48|presents examples. Finally, we discuss related work in Pand future work 
in Table [4] ({8) gathers the main refinement types we can give to example 
functions, most of them defined in Table [B] Omitted material is available in [28]. 


2 Outline 


Overview of the Guarded A-Calculus. Guarded recursion enforces produc- 
tivity of programs using a type system equipped with a type modality pm, in 
order to indicate that one has access to a value not right now but only “later”. 
One can define guarded streams Str® A over a type A via the guarded recursive 
definition Str? A = A x » Str® A. Streams that inhabit this type have their head 
available now, but their tail only one step in the future. The type modality » is 
reflected in programs with the next operation. One also has a fixpoint constructor 
on terms fix(a).M for guarded recursive definitions. They are typed with 


EF}M:A E,x:h>ALM:A 
EF next(M): >A EF fix(x).M : A 


This allows for the constructor and basic destructors on guarded streams to 
be defined as in Fig. |1| where fold(—) and unfold(—) are explicit operations 
for folding and unfolding guarded recursive types. In the following, we use the 
infix notation a ::8 s for Con a s. Using the fact that the type modality » 
is an applicative functor [49], we can distribute » over the arrow type. This is 
represented in the programming language by the infix applicative operator ®. 
With it, one can define the usual map function on guarded streams as in Fig. 


Compositional Safety Reasoning on Streams. Given a property y on a 
type A, we would like to consider a subtype of Str® A that selects those streams 
whose elements all satisfy y. To do so, we use a temporal modal formula O/hd]y, 
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Typed Formulae Provability Refinement Types Subtyping Typing 
XFy:A KA © {A | y} T<U EFM:T 
(94) (where F y: A, (where F y: A, (T, U refinement types, 
Table 1. Syntactic Classes and Judgments. 


and consider the refinement type {Str® A | O[hd|y}. Suppose for now that we 
can give the following refinement types to the basic stream operations: 


hd® : {Strë A | O[hd]p} — {A | 9} 
tl® : {Str A | G[hd|o} — » {Stré A | O[hd]y} 
Consë : {A | y} —> » {Str® A | Ofhd]y} —> {Strë A | Ofhd]y} 


By using the standard typing rules for -abstraction and application, together 
with the rules to type fix(z).M and ®, we can type the function map® as 


map® : ({B | Y} > {A | p}) — {Str B | Ofhd]y} — {Str A | Ofhd]p} 


A Manysorted Temporal Logic. Our logical language, taken with minor 
adaptations from [30], is manysorted: for each type A we have formulae of type 
A (notation F y : A), where ọ selects inhabitants of A. 

We use atomic modalities ([7;], [fold], [next],...) in refinements to navigate 
between types (see Fig. ga). For instance, a formula y of type Ao, specifying 
a property over the inhabitants of Ao, can be lifted to the formula [7o|y of type 
Ao X A, which intuitively describes those inhabitants of Ag x A; whose first 
component satisfy p. Given a formula y of type A, one can define its “head 
lift” [hd] of type Str® A, that enforces y to be satisfied on the head of the 
provided stream. Also, one can define a modality O such that given a formula 
w : Str® A, the formula Oy% : Str® A enforces w to be satisfied on the tail of 
the provided stream. These modalities are obtained resp. as [hd]y := [fold] [ro] 
and Oy := [fold][71][next]y. We similarly have atomic modalities [ino], [inj] on 
sum types. For instance, on the type of guarded colists defined as CoList® A := 
Fix(X). 1+ A x >X, we can express the fact that a colist is empty (resp. non- 
empty) with the formula [nil] := [fold][ino]T (resp. [anil] := [fold] [in1] T). 

We also provide a deduction system H4 y on temporal modal formulae. 
This deduction system is used to define a subtyping relation T < U between 
refinement types, with {A | y} < {A | y} when +4 y => y. The subtyping 
relation thus incorporates logical reasoning in the type system. 

In addition, we have greatest fixpoints formulae vay (so that formulae can 
have free typed propositional variables), equipped with Kozen’s reasoning prin- 
ciples [43]. In particular, we can form an always modality as Oy := va. pA Qa, 
with Oy : Str® A if ọ : Str® A. The formula Oy holds on a stream s = (s; | i > 0), 
iff y holds on every substream (s; | i > n) for n > 0. If we rather start with 
p : A, one first need to lift it to [hd] : Str® A. Then O[hd]7 means that all the 
elements of the stream satisfies , since all its suffixes satisfy [hd]q. 

Table [1] summarizes the different judgments used in this paper. 
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Beyond Safety. In order to handle liveness properties, we also need to have 
least fixpoints formulae pay. For example, this would give the eventually modal- 
ity Oy := pa. pV Oa. With Kozen-style rules, one could then give the following 
two types to the guarded stream constructor: 


Cons® : {A | p} —> > Str® A — {Str® A | O[hd]yp} 
Cons® : A — pm {Str® A | O[hd]p} — {Str® A | O[hd]y} 


But consider a finite base type B with two distinguished elements a,b, and sup- 
pose that we have access to a modality [b] on B so that terms inhabiting {B | [b]} 
must be equal to b. Using the above types for Cons®, we could type the stream 
with constant value a, defined as fix(s).a :: s, with the type {Str®B | ©[hd][b]} 
that is supposed to enforce the existence of an occurrence of b in the stream. Sim- 
ilarly, on colists we would have fix(s).a ::8 s of type {CoList®B | ©[nil]}, while 
[nil] expresses that a colist will eventually contain a nil, and is thus finite. 
Hence, liveness properties may interact quite badly with guarded recursion. Let 
us look at this in a semantic model of guarded recursion. 


Internal Semantics in the Topos of Trees. The types of the guarded à- 
calculus can be interpreted as sequences of sets (X (n))n>o where X (n) represents 
the values available “at time n”. In order to interpret guarded recursion, one also 
needs to have access to functions rž : X(n + 1) + X(n), which tell how values 
“at n+1” can be restricted (actually most often truncated) to values “at n”. This 
means that the objects used to represent types are in fact presheaves over the 
poset (N \ {0}, <). The category S of such presheaves is the topos of trees [13]. 
For instance, the type Str®B of guarded streams over a finite base type B is 
interpreted in S as (B”)n>o , with restriction maps taking (bg,...,by—1, bp) to 
(bo,.--,Dn—1). We write [A] for the interpretation of a type A in S. 


The Necessity of an External Semantics. The topos of trees cannot cor- 
rectly handle liveness properties. For instance, the formula ©[hd][b] cannot de- 
scribe in S the set of streams that contain at least one occurrence of b. Indeed, 
the interpretation of ©[hd][b] in S is a sequence (C(n))n>o with C(n) C B”. But 
any element of B” can be extended to a stream which contains an occurrence 
of b. Hence C(n) should be equal to B”, and the interpretation of ©[hd][b] is 
the whole [Str® B]. More generally, guarded fixpoints have unique solutions in 
the topos of trees [13], and Oy = pa. p V Oy gets the same interpretation as 
va. pV Oa. 

We thus have a formal system with least and greatest fixpoints, that has a 
semantics inside the topos of trees, but which does not correctly handle least 
fixpoints. On the other hand, it was shown by [50] that the interpretation of 
guarded polynomial (i.e. first-order) recursive types in S induces final coalgebras 
for the corresponding polynomial functors on the category Set of usual sets and 
functions. This applies e.g. to streams and colists. Hence, it makes sense to think 
of interpreting least fixpoint formulae over such types externally, in Set. 
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Internal External 
r [mA] := Ar]A] 
a a L ox] y]] := : A, [box] : 
ves T set Due [[box]y] := A {ll} (9 : A, [box]y : MA) 
A {le} = Fle] (if is safe) 


[y] subobject of [A] {lol} subset of PA] 


Fig. 2. Internal and External Semantics 


The Constant Type Modality. Figure[2|represents adjoint functors I : S > 
Set and A: Set > S. To correctly handle least fixpoints way : A, we would like 
to see them as subsets of FA] in Set rather than subobjects of [A] in S. On 
the other hand, the internal semantics in S is still necessary to handle definitions 
by guarded recursion. We navigate between the internal semantics in S and the 
external semantics in Set via the adjunction A 4 I’. This adjunction induces a 
comonad ATF on S, which is represented in the guarded -calculus of by the 
constant type modality W. This gives coinductive versions of guarded recursive 
types, e.g. Str A := MStr® A for streams and CoList A := W CoList® A for colists, 
which allow for productive but not causal programs [18] Ex. 1.10.(3)]. 

Each formula gets two interpretations: |y] in S and {|p|} in Set. The external 
semantics {|p|} handles least fixpoints in the standard set-theoretic way, thus the 
two interpretations differ in general. But we do have {|y|} = I'[y] when y is safe 
(Def.|6.5), that is, when ọ describes a safety property. We have a modality [box] 
which lifts y : A to WA. By defining [[box]y] := A {|p|}, we correctly handle 
the least fixpoints which are guarded by a [box] modality. When ¢ is safe, we 
can navigate between {MIA | [box]y} and W{A | y}, thus making available the 
comonad structure of W on [box]y. Note that [box] is unrelated to 


Approximating Least Fixpoints. For proving liveness properties on func- 
tions defined by guarded recursion, one needs to navigate between e.g. [box] Oy 
and Oy, while y is in general unsafe. The fixpoint Oy = ua. V Oa is 
alternation-free (see e.g. [16] §4.1]). This implies that Op can be seen as the 
supremum of the Cy for m € N, where each O” ọ is safe when y is safe. More 
generally, we can approximate alternation-free way by their finite unfoldings 
y™ (L), à la Kleene. We extend the logic with finite iterations "ay, where k is 
an iteration variable, and where pay is seen as y} (L). Let Fy := uta. pVOa. 
If y is safe then so is O*y. For safe y, p, we have the following refinement typings 
for the guarded recursive map® and its coinductive lift map: 


map® : ({B | Y} > {A | y}) > {Str® B | O*fhd]y} — {Str® A | O*[hd]y} 
map :({B | w}— {A | }) > {Str B | [box] O[hd]w} > {Str A | [box] O[hd]y} 


3 The Pure Calculus 


Our system lies on top of the guarded A-calculus of [18]. We briefly review it 
here. We consider values and terms from the grammar given in Fig. |3| (left). In 
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(Av.M)N ~> M[N/a] 


vos M,N:= v|a E := e 
| A.M MN EM mi((Mo, Mi)) ~> Mi 
| (Mo, Mı) To(M) To(E) casein:(M) of (x. No|z.N1) ~ Ni[M/zx] 
| () m™(M) (BE) unfold (fold(M)) ~> M . 
| ino(M) case M of case E of fix(x).M ~ M[next(fix(x).M)/2] 
| im(M) (x.Mo|£.Mı) (x.Moļx.M1ı) next(M) ® next(N) ~ next( MN) 
| fold(M) unfold( M) unfold( E) unbox(box,(M)) ~> Mo 
| boxo (M) unbox( M) unbox( E) prevy (next(M)) ~> M 
| next(M) prev, (M) prev, (E) prev, (M) ~ prevp(Mo) (o #[]) 
M@N E@®M 
fix(x).M v@E MN 
EM] ~ E[N] 


Fig. 3. Syntax and Operational Semantics of the Pure Calculus. 


both box,(M) and prev, (M), o is a delayed substitution of the form o = [xı > 
Mı, ...,£ķ +» Mp] and such that boxs (M) and prev, (M) bind 21,...,2, in M. 
We use the following conventions of [18]: box( M) and prev( M) (without indicated 
substitution) stand resp. for box (M) and prev, (M) i.e. bind no variable of M. 
Moreover, box, (M) stands for boxz,.42,,...,24++2,4](M) where 21,...,2% is a list 
of all free variables of M, and similarly for prev,(M). We consider the weak 
call-by-name reduction of [I8], recalled in Fig. |3| (right). 
Pure types (notation A, B, etc.) are the closed types over the grammar 


A n= 1|A+A|AxA|A>GA|PA|X | Fix(X).A| A 


where, (1) in the case Fix(X).A, each occurrence of X in A must be guarded by a 
>, and (2) in the case of WA, the type A is closed (i.e. has no free type variable). 
Guarded recursive types are built with the fixpoint constructor Fix(X).A, which 
allows for X to appear in A both at positive and negative positions, but only 
under a »>. In this paper we shall only consider positive types. 


Example 3.1. We can code a finite base type B = {bj,...,b,} as a sum of 
unit types 30,1 = 14 (--- +1), where the ith component of the sum is 
intended to represent the element b; of B. At the term level, the elements of B 
are represented as compositions of injections in,, (in,,(...in,;,())). For instance, 
Booleans are represented by Bool := 1 + 1, with tt := ing(()) and ff := iny(()). 


Example 3.2. Besides streams (Str® A), colists (CoList® A), conatural numbers 
(CoNat®) and infinite binary trees (Tree® A), we consider a type Res® A of re- 
sumptions (parametrized by I, 0) adapted from [44], and a higher-order recursive 
type Rou® A, used in Martin Hofmann’s breadth-first tree traversal (see e.g. [10]): 


Tree® A := Fix(X). A x (>X x >X) CoNat® := Fix(X). 1+ >X 
Res® A := Fix(X). A+ (I => (0 x ®X)) Rou? A := Fix(X). 1+ ((»X > » A) > A) 


Some typing rules of the pure calculus are given in Fig. |4| where a pure type A 
is constant if each occurrence of » in A is guarded by a W. The omitted rules 
are the standard ones for simple types with finite sums and products [28] §A]. 
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EHM : AlFix(X).A/X] E+ M : Fix(X).A EHM:»>(B—>A) EHN:»>B 
E F fold(M) : Fix(X).A E F unfold(M) : A[Fix(X).A/X] EFMƏN:>A 
EFM:A v1:Ai,...,0h:Ar-> M:>PA Et Mi: Ai with A; constant for 1 < i < k 
EF next(M) : >A EF previ oM, ep aM] (M) : A 
zı : Á1,..-, £k : ÅAkFM:A EF M; : A; with A; constant for 1 < i < k EFM: WA 
E F boxis M1, pM] (M) : WA EF unbox( M): A 


Fig. 4. Typing Rules of the Pure Calculus (excerpt). 


Example 3.3. Figure |1| defines some operations on guarded streams. On other 
types of Ex. we have e.g. the constructors of colists Nil := fold(ing()) : 
CoList® A and Cons := Ax.Axs.fold(ini (x, 2s)) : A > » CoList® A > CoList® A. 
Infinite binary trees Tree® A have operations son® : Tree? A — » Tree® A for d € 
{é,r}, Node? : A > » Tree® A — » Tree® A — Tree® A and label® : Tree? A > A. 


Example 3.4. Coinductive types are guarded recursive types under a W. For 
instance Str A := W Strê A, CoList A := W CoList® A, CoNat := W CoNat® and 
Res A := W Res? A, with A, I, 0 constant. Basic operations on guarded types lift 
to coinductive ones. For instance 


Cons := Ax.As.box, (Cons® x next(unbox s)) : A — Str A — Str A 
hd := As.hd® (unbox s) :StrA> A 
tl := As.box, (prev, (tl® (unbox s))) : Str A > Str A 


These definitions follow a general pattern to lift a function over a guarded re- 
cursive type into one over its coinductive version, by performing an 7-expansion 
with some box and unbox inserted in the right places. For example, one can define 
the map function on coinductive streams as: 


map := Af.As.box,(map® f (unbox s)) : (B > A) — StrB — StrA 


4 A Temporal Modal Logic 


We present here a logic of (modal) temporal specifications. We focus on syntactic 
aspects. The semantics is discussed in For the moment the logic has only one 
form of fixpoints (vay). It is extended with least fixpoints (ay) in 6] 


Manysorted Modal Temporal Formulae. The main ingredient of this pa- 
per is the logical language we use to annotate pure types when forming re- 
finement types. This language, that we took with minor adaptations from [80], 
is manysorted: for each pure type A we have formulae y of type A (notation 
+ y: A). The formulation rules of formulae are given in Fig. 


Example 4.1. Given a finite base type B = {b1,..., bn} as in Ex. with ele- 
ment b; represented by in,, (in;,(...in,,())), the formula [in,, |[in,,]... [inj,] T rep- 
resents the singleton subset {b,} of B. On Bool, we have the formulae [tt] := 
[ing] T and [ff] := [inı] T representing resp. tt and ff. 
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(a: A)EX Skp:A 
Sra:A SEFL:A SET:A Sia: Brey:A 
SEp:A Srp: A SFp:A SEW: A Sk p:A Sew: A 
LFo>yp:A SE pAw:A STEpVp:A 
Sky: Aj Sky: A; SFky:B Skyp:A 


oF [mi] : Ao x Al oF fins]y : Ao + Ai oF lev(W)lp: BOA 


XF yp: A[Fix(X).A/X] Derp:A Fp:A 
X H [fold] : Fix(X).A XF [next]p : >A H- [box]y : WA 


Sa: AFo:A a Pos y 


(v-F) XFvap:A 


(a guarded in 4) 


Fig. 5. Formation Rules of Formulae (where A, B are pure types). 


Example 4.2. (a) The formula [hd][a] > Of[hd][b] means that if the head of a 
stream is a, then its second element (the head of its tail) should be b. 

(b) On colists, we let [hd]y := [fold][in,][7o]y and Ow := [fold] [in] [r1] [next]y. 

(c) On (guarded) infinite binary trees over A, we also have a modality [Ibl] := 
[fold][zo]y : Tree® A (provided y : A). Moreover, we have modalities Og and 
Or defined on formulae y : Tree’ A as Oey := [fold] [m1][7o0][next]y~ and 
Ory := [fold] [7 ][71][next]y. Intuitively, [Ibl] y should hold on a tree t over 
A iff the root label of t satisfies y, and Cey (resp. Cry) should hold on t 
iff p holds on the left (resp. right) immediate subtree of t. 


Formulae have fixpoints vay. The rules of Fig. B]thus allow for the formation 
of formulae with free typed propositional variables (ranged over by a, 8,...), 
and involve contexts X of the form a, : Aj,...,Q,, : An. In the formation of a 
fixpoint, the side condition “a guarded in y” asks that each occurrence of a is 
beneath a [next] modality. Because we are ultimately interested in the external 
set-theoretic semantics of formulae, we assume a usual positivity condition of a 
in y. It is defined with relations a Pos y and a Neg ọ (see §B]). We just 
mention here that [ev(—)](—) is contravariant in its first argument. Note that 
[box] can only be formed for closed yp. 


Example 4.3. (a) The modality O makes it possible to express a range of safety 
properties. For instance, assuming 4, W% : Str® A, the formula O(w Oy) 
is intended to hold on a stream s = (s; | i > 0) iff, for all n € N, if (s; | i > n) 
satisfies w, then (s; | i > n + 1) satisfies ọ. 

(b) The modality O has its two CTL-like variants on Tree? A, namely VOy := 
va. pA (Ora A Ora) and 30y := va. yA (Ora V Ora). Assuming Bis A, 

yojibl]y is intended to hold on a tree t : Tree® A iff all node-labels of t satisfy 

w, while 30f|Ibl]y holds on t iff y holds on all nodes of some infinite path 

from the root of t. 
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Name Formulation [7s] [fold] [next] [ins] [ev(q)] [box] [hd] © 
ryse 
(RM) PAW (Ale VV VV Vv v viv 


(C) [AleA[Aldb = AAND Vv vw vv vv vv 


(N) [AJT v y č V" S £ 
(P) {ALL = L Ge Oy f sv) 
(Cv) [Alevd) = Ayay v s s y E E 
(C>) (Ald = (Aly) > [Aws vi (C) v v (C) 


Table 2. Modal Axioms and Rules. Types are omitted in F and (C) marks axioms 
assumed for F. but not for F. Properties of the non-atomic [hd] and © are derived. 


Modal Theories. Formulae are equipped with a modal deduction system which 
enters the type system via a subtyping relation ($5). For each pure type A, we 
have an intuitionistic theory H (the general case) and a classical theory H4 
(which is only assumed under W/[box]), summarized in Fig. [6Jand Table[2| (where 
we also give properties of the derived modalities [hd], ©). In any case, Fi ọ is 
only defined when F y: A (and so when vy has no free propositional variable). 

Fixpoints vay are equipped with their usual Kozen axioms [43]. The atomic 
modalities [7;], [fold], [next], [in;] and [box] have deterministic branching (see 
Fig. |12| q). We can get the axioms of the intuitionistic (normal) modal logic 
IK (see also e.g. [60]48]) for [7;], [fold] and [box] but not for [in;] nor for the 
intuitionistic [next]. For [next], in the intuitionistic case this is due to semantic 
issues with step indexing (discussed in 47) which are absent from the classical 
case. As for [inj], we have a logical theory allowing for a coding of finite base 
types as finite sum types, which allows to derive, for a finite base type B: 


F Vacs (lal A Ace tel) 


Definition 4.4 (Modal Theories). For each pure type A, the intuitionistic 
and classical modal theories +4 p and H p (where + ọ : A) are defined by 
mutual induction: 


— The theory +“ is deduction for intuitionistic propositional logic augmented 
with the check-marked (V) axioms and rules of Table|2| and the atioms and 
rules of Fig. |6| (for 14). 

— The theory H2 is H4 augmented with the axioms (P) and (C) for [next] and 
with the axiom (CL) (Fig. [6). 


For example, we have HS“ 4 Ow > (y A OQOw) and HŽ 4 ( A OO) Y. 
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te pod  Fy:A 
EPSA lev(d)ly = [evy] FP *4 (lev(Ho)le A lev(r)]e) = [ev(yo V dr )le 


(CL) ey : - - - 
Fé (ly => y) => p)> p FMA [box]p 40442 ([ino]T V [inı]T) A >([ino] T A [ini]T) 
-^ y = olv/a] 
Faora (int) S Cine e inp FA vap > prapa] Fay Svag 
Fig. 6. Modal Axioms and Rules. 
tA gs ek ae 


TST] AS{A|T} [Al STATO) TWA] [boyy < (mA ] boxy} 


{PA | [next]lp} =m {Ale} {BA lev(y)]y}={B] Y} > {4 | 9} 


Fig. 7. Subtyping Rules (excerpt). 


5 A Temporally Refined Type System 


Temporal refinement types (or types), notation T, U,V, etc., are defined by: 
T,U := A| {A|} |T4+7T|TxT|TOT|»>T| eT 


where F ọ : A and, in the case of NIT’, the type T has no free type variable. So 
types are built from (closed) pure types A and temporal refinements {A | p}. 
They allow for all the type constructors of pure types. 

As a refinement type {A | p} intuitively represents a subset of the inhabitants 
of A, it is natural to equip our system with a notion of subtyping. In addition 
to the usual rules for product, arrow and sum types, our subtyping relation is 
made of two more ingredients. The first follows the principle that our refinement 
type system is meant to prove properties of programs, and not to type more 
programs, so that (say) a type of the form {A | y} > {B | } is a subtype of 
A — B. We formalize this with the notion of underlying pure type |T| of a type 
T. The second ingredient is the modal theory H4 y of The subtyping rules 
concerning refinements are given in Fig. |7| where T = U enforces both T < U 
and U < T. The full set of rules is given in [28] §C]. Notice that subtyping does 
not incorporate (un)folding of guarded recursive types. 

Typing for refinement types is given by the rules of Fig. |8| together with the 
rules of 43] extended to refinement types, where T is constant if |T| is constant. 
Modalities [7;], [inj], [fold] and [ev(—)] (but not [next]) have introduction rules 
extending those of the corresponding term formers. 
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E F Mi : {A; | p} E F Mı-i $ Ali 
E F (Mo, M1) : {Ao x Ai | [a ]e} 


EEM: {Ap x At | [mJ] yp} 


(Pi;-I) EF m(M) : {A; | o} 


(P-E) 


Ex: {B| yY} M:{A]| p} 
EF Az.M : {B > A | [ev(y)]y} 


EL M:{B>A|lev(b)]p} EHN: {B |Y} 


(BV EF MN {Al|g} 


(Ev-E) 


Eb M : {A[Fix(X).A/X] | y} ELM: {Fix(X).A | [fold]y} 


ED- Se fold(M): FKA] loid} PE) EE unfold): LAFA XI o} 


EFH M: {Ao + A1 | [ini]p} E,a:{Ai| p} F Ni: U Eye: Arit Mia 


(Buck) Et case M of (x.No|@.Ni) : U 


for i € {0, 1}, 
EF M: {A | goV yi} Eti {A |a EN: U 
EF N[M/a]:U 


EEM:{A;| 9} 
Et ini(M) : {Ao + At | [ini]p} 


(V-E) (INJ;-I) 


ELM:{A|p>o} EFM:{A|y} 
EFM: {A]| 9} 


EEM:{A|1} EFN:|U| 
EFN:U 


(MP) (ExF) 


EbM:T T<U 


(Sus) Fe Maa 


Fig. 8. Typing Rules for Refined Modal Types. 


Example 5.1. Since y > y > (py Aw) and using two times the rule (MP), we 
get the first derived rule below, from which we can deduce the second one: 


EFM:{A|p} EFM:{A] y} EFM:{A|p} EFN:{B]| y} 
EFM:{A| pry} EF (M,N): {Ax B | [rolp A [mi ]Y} 
Example 5.2. We have the following derived rules: 
EFM: {Str® A | Op} EFM: {Str® A | pA COy} 
Eru aaO ~~ EFM: {Str A | Oy} 


Example 5.3. We have Cons® : A > pm {Str® A | y} > {Str® A | Ow} as well as 
tl® : {Str® A | Oy} > > {Str® A | p}. 


Example 5.4 (“Always” (O) on Guarded Streams). The refined types of Consë, 
hd®, tl? and map® mentioned in 2] are easy to derive. We also have the type 


{Stré A | D[hd]po} — {Str A | Ofhd]yi} — {Str A | O([hd] yo V [hd]p1)} 


for the merge® function which takes two guarded streams and interleaves them: 


merge: Str® A — Stř A — Str? A 
:= fix(g).Aso.Àsı. (hd® so) ::8 next((hd® s1) ::8 (g ® (tI® so) ® (tI® s1))) 


6 The Full System 


The system presented so far has only one form of fixpoints in formulae (vag). 
We now present our full system, which also handles least fixpoints (uay) and 
thus liveness properties. A key role is played by polynomial guarded recursive 
types, that we discuss first. 
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Sia:Aky:A Sa: Aky:A Sia: Aryp:A 


-F 
Wee) SF pap: A Sk ptap:A Stvtap:A 


Fig. 9. Extended Formation Rules of Formulae (with a Pos y and a guarded in y). 


^ ylw/a] > Y 
FA p[uap/a] = pap FA pop >y 
F4 Ot+tap & y|Oap/al H4 Wap & L FA Yap & T 
[t] < [ul [t] > [ul 
H4 utap => pay H4 utap => pay H4 tap > ay H4 vay => vray 


Fig. 10. Extended Modal Axioms and Rules (with A a pure type and @ either u or v). 


Strictly Positive and Polynomial Types. Strictly positive types (notation 
P*,Qt, etc.) are given by 


P = A| X |>P+t | Pt+ P+ | Pt x Pt | Fix(X).P+ | Bo Pt 


where A, B are (closed) constant pure types. Strictly positive types are a conve- 
nient generalization of polynomial types. A guarded recursive type Fix(X).P(X) 
is polynomial if P(X) is induced by 


P(X) :=A | »X | P(X)+P(X) | P(X)x P(X) | B> P(X) 


where A, B are (closed) constant pure types. Note that if Fix(X).P(X) is poly- 
nomial, X cannot occur on the left of an arrow (—) in P(X). We say that 
Fix(X).P(X) (resp. Pt) is finitary polynomial (resp. finitary strictly positive) 
if B is a finite base type (see Ex. in the above grammars. The set-theoretic 
counterpart of our polynomial recursive types are the exponent polynomial func- 
tors of [BI], which all have final Set-coalgebras (see e.g. [BI] Cor. 4.6.3]). 


Example 6.1. For A a constant pure type, e.g. Str® A, CoList® A and Tree® A as 
well as Str®(Str A), CoList®(Str A) and Res® A (with I, 0 constant) are polyno- 
mial. More generally, polynomial types include all recursive types Fix(X).P(X) 
where P(X) is of the form Xf _ọ Ai x (»X)?: with A;, B; constant. The non- 
strictly positive recursive type Rou® A of Ex.[3.2] used in Hofmann’s breadth-first 
traversal (see e.g. [10]), is not polynomial. 


The Full Temporal Modal Logic. We assume given a first-order signature 
of iteration terms (notation t,u,etc.), with iteration variables k, ¢,etc., and for 
each iteration term t(ki,...,km) with variables as shown, a given primitive 
recursive function |t] : N™ — N. We assume a term 0 for 0 € N and a term k+1 
for the successor function n E N> n+1€N. 
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The formulae of the full temporal modal logic extend those of Fig. [5}with least 
fixpoints pay and with approximated fizpoints u*ap and vtay where t is an 
iteration term. The additional formation rule for formulae are given in Fig. p] We 
use 0 as a generic notation for u and v. Least fixpoints way are equipped with 
their usual Kozen axioms. In addition, iteration formulae v*agy(a) and ptay(a) 
have axioms expressing that they are indeed iterations of y(a) from resp. T and 
L. A fixpoint logic with iteration variables was already considered in [63]. 


Definition 6.2 (Full Modal Theories). The full intuitionistic and classical 
modal theories (still denoted +4 and H4) are defined by eatending Def. with 
the axioms and rules of Fig. 


Example 6.3. Least fixpoints allow us to define liveness properties. On streams 
and colists, we have Oy := ua. p V Oa and ọ U 4 := pa. VV (pA Oa). 
On trees, we have the CTL-like JOy := pa. p V (Ora V Ora) and VO := 
pa. pV (Ora A Ora). The formula 4¢y is intended to hold on a tree if there 
is a finite path which leads to a subtree satisfying y, while VOy is intended to 
hold if every infinite path crosses a subtree satisfying y. 


Remark 6.4. On finitary trees (as in Ex. [6.1] but with A;, B; finite base types), 
we have all formulae of the modal j-calculus. For this fragment, satisfiability is 
decidable (see e.g. [I6]), as well as the classical theory F. by completeness of 
Kozen’s axiomatization [68] (see [58] for completeness results on fragments of 
the y-calculus). 


The Safe and Smooth Fragments. We now discuss two related but dis- 
tinct fragments of the temporal modal logic. Both fragments directly impact the 
refinement type system by allowing for more typing rules. 

The safe fragment plays a crucial role, because it reconciles the internal and 
external semantics of our system (see q). It gives subtyping rules for W (Fig.[11), 
which makes available the comonad structure of W on [box]y when y is safe. 


Definition 6.5 (Safe Formula). Say a; : A1,..., Qn : Ån F Y: A is safe if 


(i) the types Ay,...,An,A are strictly positive, and 
(it) for each occurrence in y of a modality [ev(w)], the formula y is closed, and 
(iii) each occurrence in y of a least fixpoint (ua(—)) and of an implication (=) 
is guarded by a [box]. 


Note that the safe restriction imposes no condition on approximated fixpoints 
6*a. Recalling that the theory under a [box] is H4, the only propositional connec- 
tives accessible to H4 in safe formulae are those on which H4 and +4 coincide. 
The formula [—nil] = [fold][iny]T is safe. Moreover: 


Example 6.6. Any formula without fixpoint nor [ev(—)] is equivalent in Fe toa 
safe one. It ọ is safe, then so are [hd]y, [Ibl]y, as well as Ay (for A € {0, VO, 30}) 
and [box]Ay (for A € {O,40,VO}). 
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Definition 6.7 (Smooth Formula). A formula a, : A1,...,Qn:Anbyp:A 
is smooth if 


(i) the types Ay,...,An,A are finitary strictly positive, and 
(ii) for each occurrence in p of a modality [ev(w)], the formula w is closed, and 
(iii) p is alternation-free: for 0,6’ € {u,v}, (1) if OBowo is a subformula of p, 
and 0' By is a subformula of po s.t. Bo occurs free in yı, then 0 = 6’, (2) 
if some a; occurs in two subformulae OBoyo and 6B, of y, then d= 6’, 
and (3) if some a; occurs in a subformula 0’ By of p, then a; Pos 4. 


Our notion of alternation freedom is adapted from [16], in which propositional 
(fixpoint) variables are always positive. Note that the smooth restriction imposes 
no further conditions on approximated fixpoints 6*a. In the smooth fragment, 
greatest and least fixpoints can be thought about resp. as 


NeT) and V æn) 


mEN mEN 


Iteration terms allow for formal reasoning about such unfoldings. Assuming [t] = 
m €E N, the formula vtay(a) (resp. u*ay(a)) can be read as y™(T) (resp. 
yp™(L)). This gives the rules (v-I) and (u-E) (Fig.[11), which allow for reductions 
to the safe case (see examples in g8). 


Remark 6.8. It is well-known (see e.g. §4.1]) that on finitary trees (see 
Rem. the alternation-free fragment is equivalent to Weak MSO (MSO with 
second-order variables restricted to finite sets). In the case of streams StrB (for a 
finite base type B), Weak MSO is in turn equivalent to the full modal p-calculus. 
In particular, the alternation-free fragment contains all the flat fixpoints of [58] 
and thus LTL on StrB and CTL on TreeB and on ResB with I, 0, B finite base 
types. A typical property on TreeB which cannot be expressed with alternation- 
free formulae is “there is an infinite path with infinitely many occurrences of b” 
for a fixed b : B (see e.g. [I6] §2.2]). 


Example 6.9. Any formula without fixpoint nor [ev(—)] is smooth. It p is smooth, 
then so are [hd]y, [Ibl]y and Ay for A € {0,VO, 40, 0,40, VO}. 


The Full System. We extend the types of 5] with universal quantification over 
iteration variables (Vk - T). The type system of (lis extended with the rules of 


Fig. 


Example 6.10. The logical rules of Fig. [10] give the following derived typing rules 
(where 8 Pos q): 


EFM : {MA | [box|y[uhay/6]} EFM : {MA | [box|y[vay/6]} 


(D SEMMA] [boxyfuay/s) YE) EF AT: MA] bodri} 
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y safe 
{HA | [box]p} =H {A | p} Vk- eT =»>Yk-T 


E-M:T EH M:T|0/k] E+ M:T[k+1/k] 


V-D EM: vk T (V-01) EEM:Vk-T 


wI) EHM: {MA | [box}y[v‘ax/B]} v-E) EFM:Yk-T 
I EEM : {WA | [bo{yvay/8]} “) EFM :T|t/k] 


EH M : {MA | [boxy uay/8]} Ex: {MA | [box]y[u‘av/B]}- N:U 


UE) EF N[M/a]:U 


Fig. 11. Extended (Sub)Typing Rules for Refinement Types (where k is not free in € 
in (Y-I) & (V-CI), Z is fresh in (v-I) & (u-E), aw and y are smooth, and 8 Pos y). 


7 Semantics 


We present the main ingredients of the semantics of our type system. We take 
as base the denotational semantics of guarded recursion in the topos of trees. 


Denotational Semantics in the Topos of Trees. The topos of trees S pro- 
vides a natural model of guarded recursion [13]. Formally, S is the category 
of presheaves over (N \ {0},<). In words, the objects of S are indexed sets 
X = (X(n))ns9 equipped with restriction maps ră : X(n +1) > X(n). Ex- 
cluding 0 from the indexes is a customary notational convenience ({13]). The 
morphisms from X to Y are families of functions f = (fn : X(n) > Y (n))n>o 
which commute with restriction, that is faor = rY of,41. As any presheaf cat- 
egory, S has (pointwise) limits and colimits, and is Cartesian closed (see e.g. [£7] 
§1.6]). We write I : S — Set for the global section functor, which takes X to 
S[1, X], the set of morphisms 1 > X in S, where 1 = ({e}), so is terminal in S. 
A typed term € + M : T is to be interpreted in S as a morphism 


[M] : [E] — ITI] 


where [|E|] = [|Zil] x -++ x [Trl] for E = a1: Ti,..., £n : Tn. In particular, a 
closed term M : T is to be interpreted as a global section [M] € I'||T|]. The 
x /+/— fragment of the calculus is interpreted by the corresponding structure 
in S. The » modality is interpreted by the functor > : S > S of [I3]. This functor 
shifts indexes by 1 and inserts a singleton set 1 at index 1. The term constructor 
next is interpreted by the natural map with component next* : X => >X as in 


rx r 
X DOE E E Xn < Xn < 
next% | | rž | Tai | re | 
>X te Xa K aN cee 
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imp} = {z € Flo x Ai] | more toh — {Ilnext}ph = {next o z € P[>A] | x € toh} 
{lfold] o}} = {æ € F[Fi(X).A] | unfoldor € {ph} {Ibo} := {z € PIMA] | m(e) € ie} 
{\[ins}e]} == {x € T[Ao + Ai] | Jy € FA] (x =injoy and y€ {\y|})} 
{llev(s))eh = {z € TIB > A] | Vy PIBMye tu} — evo (z,y) €{loh)} 


Fig. 12. External Semantics (for closed formulae). 


The guarded fixpoint combinator fix is interpreted by the morphism fix® : 
X*X — X of [B] Thm. 2.4]. 

The constant type modality W is interpreted as the comonad AI: S > S, 
where the left adjoint A : Set > S is the constant object functor, which takes a 
set S to the constant family (S)n>0. In words, all components [WA] (n) are equal 
to IA], and the restriction maps of [MA] are identities. In particular, a global 
section xz € I'[MA] is a constant family (£n)n describing a unique global section 
In+1(©) = zn(@®) € IA]. We refer to [I8] and [28] §D] for the interpretation of 
prev, box and unbox. Just note that the unit 7 : Idset + T'A is an iso. 

Together with an interpretation of guarded recursive types, this gives a deno- 
tational semantics of the pure calculus of §3} See for details. We write fold : 
[ A[Fix(X).A/X]] > [Fix(X).A] and unfold : [Fix(X).A] > [A[Fix(X).A/X]] for 
the two components of the iso [Fix(X).A] ~ [A[Fix(X).A/X]]. 


External Semantics. Mggelberg [50] has shown that for polynomial types 
such as Str® B with B a constant type, the set of global sections I'[Str® B] is 
equipped with the usual final coalgebra structure of streams over B in Set. To 
each polynomial recursive type Fix(X).P(X), we associate a polynomial functor 
Pset : Set — Set in the obvious way. 


Theorem 7.1 ([50] (see also [18])). If Fix(X).P(X) is polynomial, then the 
set T |Fix(X).P(X)] carries a final Set-coalgebra structure for Pset. 


We devise a Set interpretation {|p|} € P(L]A]) of formulae y : A. We 
rely on the (complete) Boolean algebra structure of powersets for propositional 
connectives and on Knaster-Tarski Fixpoint Theorem for fixpoints u and v. 
The interpretations of vtap(a) and p*ay(a) (for t closed) are defined to be 
the interpretations resp. of yl*l(T) and gl*l(L), where e.g. y°(T) := T and 
pe" tt(T) := v(y"(T)). We give the cases of the atomic modalities in Fig. 
(where for simplicity we assume formulae to be closed). It can be checked that, 
when restricting to polynomial types, one gets the coalgebraic semantics of [80] 
(with sums as in [81]) extended to fixpoints. 


Internal Semantics of Formulae. We would like to have adequacy w.r.t. the 
external semantics of formulae, namely that given M : {A | vy}, the global section 
[M] € TIA] satisfies {|p|} € P(I|A]) in the sense that [M] € {|p|}. But in 
general we can only have adequacy w.r.t. an internal semantics [p] € Sub([A]) 
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of formulae y : A. We sketch it here. First, Sub(X) is the (complete) Heyting 
algebra of subobjects of an object X of S. Explicitly, we have S = (S(n))n € 
Sub(X) iff for all n > 0, S(n) C X(n) and rž (t) € S(n) whenever t € S(n +1). 
For propositional connectives and fixpoints, the internal [—] is defined similarly 
as the external {|—|}, but using (complete) Heyting algebras of subobjects rather 
than (complete) Boolean algebras of subsets. 

As for modalities, let [A] be of the form [7], [in,], [next] or [fold], and assume 
[A]y : B whenever y : A. Standard topos theoretic constructions give posets 
morphisms [[A]] : Sub([A]) —> Sub([B]) such that [[7:]], [[fold]] are maps 
of Heyting algebras, [[in;]] preserves V, and A, while [[next]] preserves ^, T 
and V. With [[A]y] := [[A]]([v]), all the axioms and rules of Table |2| are 
validated for these modalities. To handle guarded recursion, it is crucial to have 
[[next]y] := »([y]]), with [[next]y] true at time 1, independently from y. As a 
consequence, [next] and O do not validate axiom (P) (Table|2), and ©[hd]y can 
“lie” about the next time step. We let [[box]y] := A({|y]|}). 

The modality [ev(w)] is a bit more complex. For 7): B and vy: A, the formula 
[ev(w) |p is interpreted as a logical predicate in the sense of [29] §9.2 & Prop. 
9.2.4]. The idea is that for a term M : {B —> A | [ev(w)]y}, the global section 
ev o ([M], x) € IA] should satisfy y whenever x € IB] satisfies ~. We refer 
to [28] §D] for details. 

Our semantics are both correct w.r.t. the full modal theories of Def. 


Lemma 7.2. If 4 ọ then {oh} = {| TI}. IHA o then [y] = [T]. 


The Safe Fragment. For a (positive and) guarded in y, the internal semantics 
of day is somewhat meaningless because S has unique guarded fixpoints [I3] 
§2.5]. In particular, the typing fix(s).Cons® a s : {Str® A | O[y]} for arbitrary 
a: A and y : Strë A (extending is indeed verified by the S semantics [—]. 
This prevents us from adequacy w.r.t. the external semantics in general. But 
this is possible for safe formulae since in this case we have: 


Proposition 7.3. If y: A is safe then {|p|} = Fy]. 


Proposition [7.3] gives the subtyping rule {MA | [box]y} = W{A | y} (Fig. Eh, 
which makes available the comonad structure of W on [box]y when ọ is safe. 
Recall that in safe formulae, implications can only occur under a [box] modality 
and thus in closed subformulae. It is crucial for Prop. [7.3] that infs and sups are 
pointwise in the subobject lattices of S, so that conjunctions and disjunctions 
are interpreted as with the usual classical Kripke semantics (see e.g. [47] §VI.7]). 
This does not hold for implications! 

The second key to Prop. is the following. For L a complete lattice, a 
Scott cocontinuous function L > L is a Scott continuous function L°P — LOP, 
i.e. which preserves codirected infs. For a safe a : A F y: A, the poset maps [y] : 
Sub([A]) > Sub([A]) and {|v} : P(TIA]) > P(T[A]) are Scott cocontinuous. 
The greatest fixpoint vay(a) can thus be interpreted, both in Set and S, using 
Kleene’s Fixpoint Theorem, as the infs of the interpretations of y™(T) form € N. 
This leads to the expected coincidence of the two semantics for safe formulae. 
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a llkn {A | y} iff an(e) € [e] (n) a ll-n Fix(X).A iff unfold o z l-n A[Fix(X).A/X] 
allkn To +T iff Jie {0,1}, 3y € TIIT], £= in: o y and y llFn T; 

x l-n To x Ti iff roo cz llkn To and 7102 |lkn Ti g \lFn 1 
sla U >T iff Yk <n, Yy € TUI], ylh U => evolz,y) lik, T 

zla >T iff 3ye T[T|], x= nexto y and y l-n T z lli >T 
z l-n ET iff Vm > 0, &n(e) llkm T (where z € I’||MIT|]) 


z ll-n Vk-T iff ailk, T[t/k] for all closed iteration terms t 


Fig. 13. The Realizability Semantics. 


The Smooth Fragment. The smooth restriction allows for continuity proper- 
ties needed to compute fixpoints iteratively, following Kleene’s Fixpoint Theo- 
rem. This implies the correctness of the typing rules (v-I) and (u-E) of Fig. 


Lemma 7.4. Given a closed smooth vay(a): A (resp. way(a): A), the func- 
tion {|p|} : P(L|A]) > P(LIA)) is Scott-cocontinuous (resp. Scott-continuous). 


We have {|vay(a)|} = Nmen {le (T) I} (resp. {uapla)h = Umen {le (L)I})- 


The Realizability Semantics. The correctness of the type system w.r.t. its 
semantics in S is proved with a realizability relation. 


Definition 7.5 (Realizability). Given a type T without free iteration variable, 
a global section x € I'||T|] andn > 0, we define the realizability relation z \l-, T 
by induction on lexicographicaly ordered pairs (n,T) in Fig. 


Lemma 7.6. Given types T,U without free iteration variable, if x \l-, U and 
U <T then z l|l-, T. 


Theorem 7.7 (Adequacy). If- M :T, where T has no free iteration variable, 
then [M] l-n T for alln > 0. 


By Thm. a program M : B > A induces a set-theoretic function I'M] : 
T[B] > IA], x= [M] ox. When B and A are polynomial (e.g. streams Str® B, 
Str® A with B, A constant), Møgelberg’s Thm. [7.1] says that I'[M] is a function 
on the usual final coalgebra for B, A in Set (e.g. the set of usual streams over 
B and A). Moreover, if e.g. M : {StrB | [box]¢)} > {Str A | [box]y}, then (modulo 
TrA ~ Idse) given a stream x that satisfies Y (i.e. x € {|w|}) the stream 
IM] (2) satisfies y (i.e. F[M] (x) € {|v]}). See d8] for examples. 


8 Examples 


We exemplified basic manipulations of our system over We give further 
examples here. The functions used in our main examples are gathered in Table [3] 
with the following conventions. We use the infix notation a ::8 s for Cons® a s 
and write |]® for the empty colist Nil®. Moreover, we use some syntactic sugar for 
pattern matching, e.g. assuming s : CoList® A we write cases of ([JB => Nia ::8 
xs +> M) for case(unfold s) of (y.N[()/y]|y-M[mo0(y)/x , 71(y)/axs]). Most of the 
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append : CoList A —> CoList A —> CoList A sched : Res A —>» Res A —> Res A 


:= As.At. := Ap.Ag. 
box, (append® (unbox s) (unbox t)) box, (sched® (unbox p) (unbox q)) 
append® : CoList® A > CoList® A — CoList® A sched®: Res® A —+ Res® A —> Res® A 
:= fix(g).As.At.case s of := fix(g).Ap.Ag. case p of 
| [Fo t | Retë a +> Retë a 
jan® as + x: (g @ xs @ (next t)) | Cont? k => 


let h = Xi. let (0, t) = ki 
in (0, g ® (next q) @ t) 


in Cont® h 
diag := As.box, (diag® (unbox s)) : Str(Str A) — Str A 
diag® := diagaux® (Ax.x) : Str®(Str A) — Str® A 


diagaux® : (Str A > Str A) —> Str®(Str A) — Str? A 
:= fix(g).At.As. Cons® ((hd o t)(hd® s)) (g ® next(t o tl) ® (tI® s)) 


fb: CoNat — CoNat — Str Bool fb® : CoNat® —+ CoNat® — Str® Bool 
:= Ac.Am. box,(fb® (unbox c) (unbox m)) := fix(g).Ac.Am. case c of 
| ZE +» ff ::8 g ® (next m) ® next(S® (next m)) 
| SEn + tt ::8 g @n@ (next m) 


extract: Rou®(CoList® A) —+ CoList®? A unfold: Rou® A — (> Rou? A > >A) —> >A 


:= fix(g).Ac. case c of := Ac. case c of 
| Over® +> Nil® | Over® +» Ak. k (next Over®) 
| Contëf œ> fg? | Contëf ++ Ak. next(fk) 


bft® := At. extract (bftaux t Over®) : Tree® A — CoList® A 


bftaux : Tree® A —> Rou®(CoList® A) —» Rou®(CoList® A) 
:= fix(g).At.Ac. Cont (Ak. (label® t) ::8 unfold c (k o (g ® (son§t))® o (g ® (son8t))®)) 


Table 3. Code of the Examples. 


functions of Table[3] are obtained from usual recursive definitions by inserting ® 
and next at the right places. We often write w |> ¢ for [ev(w)]y. Table |4]recaps 
our main examples of refinement typings, all of which (for A, B, 0, I constant, I 
finite and y, wW safe and smooth) can be derived syntactically for the functions of 
Table [8] We use intermediate typings requiring iteration terms whenever a © is 
involved. Below, “I"[.M] satisfies y” means I'M] € {p|} (modulo LA ~ Idset, 
see {7). We refer to $E] for details. 


Example 8.1 (The Append Function on CoLists). Our system can derive that 
I'[append] returns a non-empty colist if one of its argument is non-empty. Using 
[nil] (which says that a colist is finite), we can derive that append] returns a 
finite colist if its arguments are both finite. This involves the intermediate typing 


Vk-We-({CoList® A | O*[nil]} + {CoList® A | O“{nil]} + {CoList® A | o**“(nil]}) 


In addition, if the first argument of [append] has an element which satisfies 
y, then the result has an element which satisfies y. The same holds if the first 
argument is finite while the second one has an element which satisfies p 
§E.6]. 
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Map over coinductive streams (with A either 0, ©, OO or O00) 
map: ({B | Y} > {A | y}) — {Str B | [box] A[hd]y} — {Str A | [box]A[hd]y} 
Diagonal of coinductive streams of streams (with A either 0 or © 
diag : {Str(Str A) | [box]A[hd] [box]O[hd]y} — {Str A | [box]A[hd]y} 
A fair stream of Booleans (adapted from [17]8}) 
fb : CoNat — CoNat — Str Bool 
fb 0.1: {Str Bool | [box]OO[hd][tt] A  [box]O[hd] [fF] } 


Append on guarded recursive colists 
append® : {CoList® A | [mnil]} —+ CoList® A — {CoList® A | [mnil]} 
append® : CoList® A —> {CoList® A | [-nil]} — {CoList® A | [Fnil]} 
Append on coinductive colists 
append : {CoList A | [box]O[hd]y} —» CoList A —> {CoList A | [box]O[hd]y} 
append : {CoList A | [box]©[nil]} —> {CoList A | [box]O[hd]y} —> {CoList A | [box]O[hd]y} 
append : {CoList A | [box]©[nil]} —> {CoList A | [box]©[nil]} —+ {CoList A | [box]©[nil]} 


Breadth-first tree traversal 
bftë : {Tree® C | VO[Ibl]v} — {CoList® C | O[hd]¥} 
(a la [35] or with Hofmann’s algorithm (see e.g. [10])) 


A scheduler of resumptions (adapted from [44] 
sched : {Res A | [box] O[Ret]} —> {Res A | [box]O[Ret]} —> {Res A | [box]©[Ret]} 
sched : {Res A | [box]©[now]7)} —> {Res A | [box] O[now]y} —> {Res A | [box]©[now]~} 
sched : {Res A | [box]OO[Ret]} —> {Res A | [box]OO[Ret]} —> {Res A | [box]O©[Ret]} 
sched : {Res A | [box]OO[out]?} — {Res A | [box]OO[out]v} — {Res A | [box]09 [out] ð} 

(where © is either VO or 40, O is either VO or 30, and [out] is either [Aout] or [Vout]) 


Table 4. Some Refinement Typings (functions defined in Table[3). 


Example 8.2 (The Map Function on Streams). The composite modalities 00 
and ©O over streams are read resp. as “infinitely often” and “eventually always”. 
Provided with a function f : P|B] > IA] taking b € IB] satisfying % to 
f(b) € FB] satisfying p, the function [map] on set-theoretic streams returns 
a stream which infinitely often (resp. eventually always) satisfies ọ if its stream 
argument infinitely often (resp. eventually always) satisfies ~ 8E.3]. 


Example 8.3 (The Diagonal Function). Consider a stream of streams s. We have 
s = (s; | i > 0) where each s; is itself a stream s; = (s; j | j => 0). The diagonal 
of s is then the stream (s; | i > 0). Note that si; = hd(tl’(hd(tl’(s))). Indeed, 
tl’ (s) is the stream of streams (sx | k > i), so that hd(tl’(s)) is the stream s; and 
tl’(hd(tl’(s))) is the stream (5; | k > i). Taking its head thus gives s;;. In the 
diag function of Table[3| the auxiliary higher-order function diagaux® iterates the 
coinductive tl over the head of the stream of streams s. We write o for function 
composition, so that assuming s : Str? (Str A) and t : Str A — Str A, we have (on 
the coinductive type Str A), (hd® s) : Str A and 


(hdot):StrA—> A (hd o t)(hd® s): A (totl) : Str A > Str A 


The expected refinement types for diag (Table [4) say that if its argument is a 
stream whose component streams all satisfy Oy, then I'[diag] returns a stream 
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whose elements all satisfy y. Also, if the argument of I'[diag] is a stream such 
that eventually all its component streams satisfy Oy, then it returns a stream 
which eventually always satisfies y. See [28] §E.4] for details. 


Example 8.4 (A Fair Stream of Booleans). The non-regular stream (fb 0 1), 
adapted from [178], is of the form ff. tt- ff -tt?- ff --- ff- tt™ -ff-tt™tt. ff- --. It thus 
contains infinitely many tt’s and infinitely many ff’s. We indeed have (see 
8E.5] for details) (fb 0 1) : {Str Bool | [box]O@[hd][tt] A [box]O [hd] /fF]}. 


Example 8.5 (Resumptions). The type of resumptions Res® A (see Ex. |3.2) is 
adapted from [44]. Its guarded constructors are 


Retë := a. fold(ing a) : A — Res® A 
Cont® := Ak. fold(iny k) : (I — (0 x > Res® A)) —> Res® A 


Ret®(a) represents a computation which returns the value a : A, while Cont®(f, k) 
(with (f,&) : I > (0 x » Res? A)) represents a computation which on input 
i: I outputs fi: 0 and continues with ki : » Res® A. Given p,q : Res® A, the 
scheduler (sched® p q), adapted from [44], first evaluates p. If p returns, then 
the whole computation returns, with the same value. Otherwise, p evaluates to 
say Cont®(f,k). Then (sched® p q) produces a computation which on input i: I 
outputs fi and continues with (sched® q (ki)), thus switching arguments. 

Let I be a finite base type (so that Res® A is finitary polynomial). Let w : A, 
0:0 and ọ : Res® A. We have the following formulae (where i : I): 


[Ret] :=[fold][ing}T [outs] := [fold] [ini] ({a] I> [mo]¥) 
[now] := [fold] [ino] Oi := [fold][ini] ([i] |> [71] [next]y) 


The formula [Ret] (resp. [now]w) holds on a resumption which immediately re- 
turns (resp. with a value satisfying Y) and we have Retë : A — {Res® A | [Ret]}, 
Retë : {A | Y} — {Res® A | [now]y}. Moreover, the typings 


Cont® : {I — (0 x > Res? A) | [i] |> [zo] } —> {Res® A | [out,]v} 
Cont® : {I — (0 x > Res? A) | [i] |H [71] [next]p} —> {Res® A | Civ} 


express that [out;]V : Res A is satisfied by Cont®(f,k) if fi satisfies V, and that 
Oi : Res® A is satisfied by Cont®(f,k) if ki satisfies [next]. Since I is a finite 
base type, it is possible to quantify over its inhabitants. We thus obtain CTL-like 


variants of O and © (Ex. [4.3] (b) and Ex. (6.3). Namely: 


[Aout]? := A^ierlouti]® : Res® A Ov:=NicrtOiy : Res A 
[Vout]? := Vierlouti]ð : Res® A Q Y := VierOi y : Re y 
VYOy := va. p\ @a: Re A YOu := ua. pV @a: Res® A 
jOy := va. pA Q a : Re A JO := pa. pV @a: Res® A 


Our system can prove that I'[sched] returns in finite time when so do its argu- 
ments, either along some or along any sequence of inputs. We moreover have 
expected OO properties for all possible (consistent) combinations of 4/V and 
[Ret] /[Vout]/[Aout] (Table [4] with ~ : A, V : 0 safe and smooth) [28] §E.7]. 
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Example 8.6 (Breadth-First Traversal). The function bft® of Table [3] (where g® 
stands for Az.g ® x) implements Martin Hofmann’s algorithm for breadth-first 
tree traversal. This algorithm involves the higher-order type Rou® A (see Ex. 3.2) 
with constructors Over® := fold(ing()) : Rou® A and 


Cont® := Af.fold(in, f) : ((> Rou® A > >A) > A) > Rou® A 
We refer to [I0] for explanations. Consider a formula y : A. We can lift p to 
[Rou] := va. [fold][iny](([next]a |> [next]y) |— p) : Rou® A 


We then easily derive the expected refinement type of bft® (Table|4] where UJ: C). 
Assume that V is safe. On the one hand it is not clear what the meaning of [Rou]0 
is, because it is an unsafe formula over a non-polynomial type. On the other 
hand, the type of bft® in Tab. [4] has its standard expected meaning (namely: if 
all nodes of a tree satisfy V then so do all elements of its traversal) because the 
types Tree® C, CoList® C are polynomial and the formulae VO[Ibl]v, O[hd]ð are 
safe. Hence, our system can prove standard statements via detours through non- 
standard ones, which illustrates its compositionality. We have the same typing 
for a usual breadth-first tree traversal with forests (à la [35]). See §E.8]. 


9 Related Work 


Type systems based on guarded recursion have been designed to enforce prop- 
erties of programs handling coinductive types, like causality |45), productiv- 
ity [5[50/18]6]25]24], or termination [62]. These properties are captured by the 
type systems, meaning that all well-typed programs satisfy these properties. 

In an initially different line of work, temporal logics have been used as type 
systems for functional reactive programming (FRP), starting from LTL to 
the intuitionistic modal -calculus [17]. These works follow the Curry-Howard 
“proof-as-programs” paradigm, and reflect in the programming languages the 
constructions of the temporal logic. 

The FRP approach has been adapted to guarded recursion, e.g. for the ab- 
sence of space leaks [44], or the absence of time leaks, with the Fitch-style system 
of [7]. This more recently lead [8] to consider liveness properties with an FRP ap- 
proach based on guarded recursion. In this system, the guarded -calculus (pre- 
sented in a Fitch-style type system) is extended with a delay modality (written 
O) together with a “until type” A Until B. Following the Curry-Howard corre- 
spondence, A Until B is eliminated with a specific recursor, based on the usual 
unfolding of Until in LTL, and distinct from the guarded fixpoint operator. 

In these Curry-Howard approaches, temporal operators are wired into the 
structure of types. This means that there is no separation between the program 
and the proof that it satisfies a given temporal property. Different type formers 
having different program constructs, different temporal specifications for the 
same program may lead to different actual code. 

We have chosen a different approach, based on refinement types, with which 
the structure of formulae is not reflected in the structure of types. This allows 
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for our examples to be mostly written in a usual guarded recursive fashion (see 
Table[3p. Of course, we indeed use the modality W at the type level as a separation 
between safety and liveness properties. But different liveness properties (e.g. ©, 
So, OQ) are uniformly handled with the same M-type, which is moreover the 
expected one in the guarded A-calculus [I8]. 

Higher-order model checking (HOMC) [5489] has been introduced to check 
automatically that higher-order recursion schemes, a simple form of higher-order 
programs with finite data-types, satisfy a -calculus formula. Automatic verifi- 
cation of higher-order programs with infinite data-types (integers) has been ex- 
plored for safety [40], termination [46], and more generally w-regular [51] prop- 
erties. In presence of infinite datatypes, semi-automatic extensions of HOMC 
have recently been proposed [69]. In contrast with this paper, most HOMC ap- 
proaches do not consider input-output behaviors on coalgebraic data. A notable 
exception is [41]23], but it does not handle higher-order functions (such as map), 
nor polynomial types such as Str(Str A) (Ex. or non-positive types such as 
Rou A (Ex. and imposes a strong linearity constraint on pattern matching. 

Event-driven approaches consider effects generating streams of events [61], 
which can be checked for temporal properties with algorithms based on (HO)MC 
[26]27], or, in presence of infinite datatypes, with refinement type systems [42]53}. 
Our iteration terms can be seen as oracles, as required by [42] to handle liveness 
properties, but we do not know if they allow for the non-regular specifications 
of [53]. While such approaches can handle infinite data types with good levels of 
automation, they do not have coinductive types nor branching time properties, 
such as the temporal specification of sched on resumptions (Ex. 

Along similar lines, branching was approached via non-determinism in [64], 
which also handles universal and existential properties on traces. This frame- 
work can handle CTL-like properties of the form 4/V-O/© (with our notation 


of Ex. |8.5), but not nested combinations of these (as e.g. JOV© for sched in 
Ex. |8.5). It moreover does not handle coinductive types. 


10 Conclusion and Future Work 


We have presented a refinement type system for the guarded A-calculus, with re- 
finements expressing temporal properties stated as (alternation-free) p-calculus 
formulae. As we have seen, the system is general enough to prove precise behav- 
ioral input/output properties of coinductively-typed programs. Our main con- 
tribution is to handle liveness properties in presence of guarded recursive types. 
As seen in this comes with inherent difficulties. In general, once guarded 
recursive functions are packed into coinductive ones using W, the logical reason- 
ing is made in our system directly on top of programs, following their shape, 
but requiring no further modification. We thus believe to have achieved some 
separation between programs and proofs. 

We provided several examples. While they demonstrate the flexibility of our 
system, they also show that more abstraction would be welcomed when proving 
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liveness properties. In addition, our system lacks expressiveness to prove e.g. 
liveness properties on breadth-first tree traversals. 

We believe that our approach could be generalized to other programming 
languages with inductive or coinductive types. The key requirement are: (1) 
modalities in the temporal logic to navigate through the types of the languages, 
(2) a semantics to indicate when a program satisfies a formula of the temporal 
logic, which is sufficiently closed to the set-theoretic one for liveness proper- 
ties to get their expected meaning, and (3) inference rules to reason over this 
realizability semantics. 

Extensions of the guarded A-calculus with dependent types have been ex- 
plored [401624]. It may be possible to extend our work to these systems. This 
would require to work in a Fitch-style presentation of the W modality, as in [712], 
since it is not known how to extend delayed substitutions to dependent types 
while retaining decidability of type-checking [I5]. Also, it is appealing to inves- 
tigate the generalization of our approach to sized types [I], in which guarded 
recursive types are representable [67]. 

We plan to investigate type checking. For instance, in a decidable frag- 
ment like the p-calculus on streams, one can check that a function of type 
{Str®C | OO[hd]v} > {Str® B | OO[hd]w} can be postcomposed with one of 
type {Str® B | OO[hdjw} > {Str® A | OO[hd]y} (since OO[hd]y) ©[hd]w). 
Hence, we expect that some automation is possible for fragments of our logic. In 
presence of iteration terms, arithmetic extensions of the -calculus may 
provide interesting backends. An other direction is the interaction with HOMC. 
If (say) a stream over A is representable in a suitable format, one may use HOMC 
to check whether it can be argument of a function expecting e.g. a stream of 
type {Str® A | OO[hd]y}. This might provide automation for fragments of the 
guarded -calculus. Besides, the combination of refinement types with automatic 
techniques like predicate abstraction [57], abstract interpretation [84], or SMT 
solvers [66]65| has been particularly successful. More recently, the combination 
of refinement types inference with HOMC has been investigated [59]. 

We would like to explore temporal specification of general, effectful programs. 
To do so, we wish to develop the treatment of the coinductive resumptions 
monad [55], that provides a general framework to reason on effectful computa- 
tions, as shown by interaction trees [70]. It would be interesting to study tem- 
poral specifications we could give to effectful programs encoded in this setting. 
To formalize reasoning on such examples, we would like to design an embedding 
of our system in a proof assistant like Coq. 

Following [3], guarded recursion has been used to abstract the reasoning on 
step-indexing [4] that has been used to design Kripke Logical Relations [2] for 
typed higher-order effectful programming languages. Program logics for reason- 
ing on such logical relations uses this representation of step-indexing via 
guarded recursion. It is also found in Iris , a framework for higher-order con- 
current separation logic. It would be interesting to explore the incorporation of 
temporal reasoning, especially liveness properties, in such logics. 
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Abstract. Language-integrated query based on comprehension syntax 
is a powerful technique for safe database programming, and provides a 
basis for advanced techniques such as query shredding or query flatten- 
ing that allow efficient programming with complex nested collections. 
However, the foundations of these techniques are lacking: although SQL, 
the most widely-used database query language, supports heterogeneous 
queries that mix set and multiset semantics, these important capabili- 
ties are not supported by known correctness results or implementations 
that assume homogeneous collections. In this paper we study language- 
integrated query for a heterogeneous query language WRC (Set, Bag) 
that combines set and multiset constructs. We show how to normalize 
and translate queries to SQL, and develop a novel approach to querying 
heterogeneous nested collections, based on the insight that “local” query 
subexpressions that calculate nested subcollections can be “lifted” to the 
top level analogously to lambda-lifting for local function definitions. 


Keywords: language-integrated query - nested relations - multisets 


1 Introduction 


Since the rise of relational databases as important software components in the 
1980s, it has been widely appreciated that database programming is hard [13]. 
Databases offer efficient access to flat tabular data using declarative SQL queries, 
a computational model very different from that of most general-purpose lan- 
guages. To get the best performance from the database, programmers typically 
need to formulate important parts of their program’s logic as queries, thus effec- 
tively programming in two languages: their usual general-purpose language (e.g. 
Java, Python, Scala) and SQL, with the latter query code typically constructed 
as unchecked, dynamic strings. Programming in two languages is more than 
twice as difficult as programming in one language [35]. The result is a hybrid 
programming model where important parts of the program’s functionality are 
not statically checked and may lead to run-time failures, or worse, vulnerabilities 
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such as SQL injection attacks. This undesirable state of affairs was recognized 
by Copeland and Maier [13] who coined the term impedance mismatch for it. 


Though higher-level wrapper libraries and tools such as object-relational map- 
pings (ORM) can help ameliorate the impedance mismatch, they often come at 
a price of performance and lack of transparency, as high-level operations on in- 
memory objects representing database data are not always mapped efficiently to 
queries [45]. An alternative approach, which has almost as long a history as the 
impedance mismatch problem itself, is to elevate queries in the host language 
from unchecked strings to a typed, domain-specific sublanguage, whose interac- 
tions with the rest of the program can be checked and which can be mapped 
to database queries safely while providing strong guarantees. This approach is 
nowadays typically called language-integrated query following Microsoft’s suc- 
cessful LINQ extensions to .NET languages such as C# and F# [36,49]. It is 
ultimately based on Trinder and Wadler’s insight that database queries can be 
modeled by a form of monadic comprehension syntax [50]. 


Comprehension-based query languages were placed on strong foundations 
in the database community in the 1990s [3,4,40,55,33]. A key insight due to 
Paredaens and van Gucht [40] is that although comprehension-based queries can 
manipulate nested collections, any expression whose input and output are flat 
collections (i.e. tables of records without other collections nested inside field val- 
ues) can always be translated to an equivalent query only using flat relations (i.e. 
can be expressed in an SQL-like language). Wong [55] subsequently generalized 
this result and gave a constructive proof, in which the translation from nested 
to flat queries is accomplished through a strongly normalizing rewriting system. 


Wong’s work has informed a number of successful implementations, such 
as the influential Kleisli system [56] for biomedical data integration, and the 
Links programming language [12]. Although the implementation of LINQ in 
C# and F# was not directly based on normalization, Cheney et al. [7] showed 
that normalization can be performed as a pre-processing step to improve both 
reliability and performance of queries, and guarantee that a well-formed query 
expression evaluates to (at most) one equivalent SQL expression at run time. 


Comprehension-based language-integrated query also forms the basis for li- 
braries such as Quill for Scala [41] and Database-Supported Haskell [21]. Most 
recently, language-integrated query has been extended further to support eff- 
cient execution of queries that construct nested results [25,8,21,53], by translat- 
ing such queries to a bounded number of flat queries. This technique, currently 
implemented in Links and DSH, has several benefits: for example to implement 
provenance-tracking efficiently in queries [17,47]. Fowler et al. [19] showed that in 
some cases, Links’s support for nested query results decreased both the number 
of queries issued and the total query evaluation time by an order of magnitude 
or more compared to a Java database application. Unfortunately, there is still a 
gap between the theory and practice of language-integrated query. Widely-used 
and practically important SQL features that mix set and multiset collections, 
such as duplicate elimination, are supported by some implementations, but with- 
out guarantees regarding correctness or reliability. So far, such results have only 
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been proved for special cases [7,8], typically for homogeneous queries operating 
on one uniform collection type. For example, in Links, queries have multiset se- 
mantics and cannot use duplicate elimination or set-valued operations. To the 
best of our knowledge the questions of how to correctly translate flat or nested 
heterogeneous queries to SQL are open problems. 

In this paper, we solve both open problems. We study a heterogeneous query 
language NRC)(Set, Bag), which was introduced and studied in our recent 
work [42]. We have previously extended the key results on query normalization 
to NRC) (Set, Bag) [43], but unlike the homogeneous case, the resulting nor- 
mal forms do not directly correspond to SQL. In this paper, we first show how 
flat NRC(Set, Bag) queries can be translated to SQL, and we then develop 
a new approach for evaluating queries over nested heterogeneous collections. 
The key (and, to us at least, surprising) insight is to recognize that these two 
subproblems are really just different facets of one problem. That is, when trans- 
lating flat NRC)(Set, Bag) queries to SQL, the main obstacle is how to deal 
with query expressions that depend on local variables; when translating nested 
NRC, (Set, Bag) queries to equivalent flat ones, the main obstacle is also how 
to deal with query expressions that depend on local variables. We solve this 
problem by observing that such query subexpressions can be lifted, analogously 
to lambda-lifting of local function definitions in functional programming [30], by 
abstracting over their free variables. Differently to lambda-lifting, however, we 
lift such expressions by converting them to tabular functions, or graphs, which 
can be calculated using database query constructs. 

The remainder of this paper presents our contributions as follows: 


— In section 2 we review the most relevant prior work and present our approach 
at a high, and we hope accessible, level. 

— In sections 3 and 4 we present the core languages NRC, (Set, Bag) and 
NRCg which will be used in the rest of the paper. 

— Section 5 presents our results on translation of flat WRC)(Set, Bag) queries 
to SQL, via NRCg. 

— Section 6 presents our results on translation of NRC, (Set, Bag) queries that 
construct nested results to a bounded number of flat NRCg queries. 

— Sections 7 and 8 discuss related work and conclude. 


2 Overview 


In this section we sketch our approach. We use Links syntax [12], which differs 
in superficial respects from the core calculus in the rest of the paper but is more 
readable. We rely without further comment on existing capabilities of language- 
integrated query in Links, which are described elsewhere [11,34,8]. Suppose, hy- 
pothetically, we are interested in certain presidential candidates and prescription 
drugs they may be taking®. In Links, an expression querying a small database of 
presidential candidates and their drug prescriptions can be written as follows: 


3 For example, to see whether drug interactions might explain erratic behavior such 
as rage tweeting, creeping authoritarianism, or creepiness more generally. 
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01| hydrochloroquine 


45 [2 
caffeine 


Fig. 1. Input tables Cand, Pres, Drug, intermediate result of Qr and result of Q1. 


QO = for (c <- Cand, p <- Pres, d <- Drug) 
where (c.cid == p.cid && p.did == d.did) 
[ (name=c . name , drug=d. drug) ] 


Some (totally fictitious and not legally actionable) example data is shown in 
Figure 1; note that the prescriptions table Pres is a multiset containing duplicate 
entries. Executing this query in Links results in the following SQL query: 


SELECT c.name, d.drug 
FROM Cand c, Pres p, Drug d 
WHERE c.cid = p.cid AND p.did = d.did 


In Links, query results from the database are mapped back to list values non- 
deterministically, and the result of the above query Qo will be a list contain- 
ing two copies of the tuple (DJT, adderall) and one copy of each of the tuples 
(DJT, hydrochloroquine) and (JRB, caffeine). If we are just interested in which 
candidates take which drugs and not how many times each drug was taken, we 
want to remove these duplicates. This can be accomplished in a basic SQL query 
using the DISTINCT keyword after SELECT. Currently, in Links there is no way 
to generate queries involving DISTINCT, and this duplicate elimination can only 
be performed in-memory. While this is not hard to do when the duplicate elimi- 
nation happens at the end of the query, it is not as clear how to handle dedupli- 
cation operations correctly in arbitrary places inside queries. Furthermore, SQL 
has several other operations that can have either set or multiset semantics such 
as UNION and EXCEPT: how should they be handled? 

To study this problem we introduced a core calculus VRC)(Set, Bag) [42] 
(reviewed in the next section) in which there are two collection types, sets and 
multisets (or bags); duplicate elimination maps a multiset to a set with the same 
elements, and promotion maps a set to the least multiset with the same elements. 

We considered, but were not previously able to solve, two problems in the 
context of NRC, (Set, Bag) which are addressed in this paper. First, the fun- 
damental results regarding normalization and translation to SQL have been 
studied only for homogeneous query languages with collections consisting of 
either sets, bags, or lists. We recently extended the normalization results to 
NRC) (Set, Bag) [43], but the resulting normal forms do not correspond directly 
to SQL queries if operations such as deduplication, promotion, or bag difference 
are present. Second, query expressions that construct nested collections cannot 
be translated directly to SQL and can be very expensive to execute in-memory 
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using nested loops, leading to the N+1 query problem (or query avalanche prob- 
lem [26]) in which one query is performed for the outer loop and then another N 
queries are performed, one per iteration of the inner loop. Some techniques have 
been developed for translating nested queries to a fixed number of flat queries, 
but to date they either handle only homogeneous set or bag collections [54,8], 
or lack detailed correctness proofs [26,52]. 

Regarding the first problem, the closest work in this respect is by Libkin 
and Wong [33], who studied and related the expressiveness of comprehension- 
based homogeneous set and bag query languages but did not consider their 
heterogeneous combination or translation to SQL. The following query illustrates 
the fundamental obstacle: 


Q1 = for (c <- Cand) 
for (d <- dedup(for (p <- Pres, d <- Drug) 
where (c.cid == p.cid && p.did == d.did) 
[d. drug] )) 
[(name=c.name, drug=d)] 


This query is similar to Qo, but eliminates duplicates among the drugs for each 
candidate. The query contains a duplicate elimination operation (dedup) applied 
to another query subexpression that refers to c, which is introduced in an earlier 
generator. This is not directly supported in classic SQL: by default the subqueries 
in FROM clauses cannot refer to tuple variables introduced by earlier parts of the 
FROM clause. In fact, this query is expressible in SQL:1999 using the LATERAL 
keyword, which does allow such sideways information-passing: 


SELECT c.name,d.drug 
FROM Cand c, LATERAL (SELECT DISTINCT d.drug 
FROM Pres p, Drug d 
WHERE p.cid = c.cid AND p.did = d.did) d 


(Without the LATERAL keyword, this query is not well-formed SQL.) However, 
such queries have only recently become widely supported, so are not available on 
legacy databases, and even when supported, are not typically optimized effec- 
tively; for example PostgreSQL will evaluate it as a nested loop, with quadratic 
complexity or worse. 

Regarding the second problem, Van den Bussche [54] showed that any query 
returning nested set collections can be simulated by n flat queries, where n is 
the number of occurrences of the set collection type in the result. However, 
this translation has not been used as the basis for a practical system to our 
knowledge, and does not respect multiset semantics. Cheney et al. [8] provided 
an analogous shredding translation for nested multiset queries, but translated to 
a richer target language (including SQL:1999 features such as ROW_NUMBER) and 
did not handle operations such as multiset difference or duplicate elimination. 
Thus, neither approach handles the full expressiveness of a heterogeneous query 
language over bags and sets. The following query illustrates the fundamental 
obstacle: 
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Q2 = for (x <- Cand) 
[(name=x.name, drugs=dedup(for (p <- Pres, d <- Drug) 
where (x.cid == p.cid and p.did == d.did) 
[d.drug]))] 


Much like Q1, Q2 builds a multiset of pairs (name, drugs) but here drugs is a 
set of all of the drugs taken by candidate name. Such a query is, of course, not 
even syntactically expressible in SQL because it returns a nested collection; it is 
not expressible in previous work on nested query evaluation either, because the 
result is a multiset of records, one component of which is a set. 

We will now illustrate how to translate Qı to a plain SQL query (not using 
LATERAL), and how to translate Q2 to two flat queries such that the nested 
result can be constructed easily from their flat results. First, note that we can 
rewrite both queries as follows, introducing an abbreviation F(x) for a query 
subexpression parameterized by z: 


F(x) = for (p <- Pres, d <- Drug) 

where (x.cid == p.cid and p.did == d.did) 

[d. drug] 
Qi = for (c <- Cand) for (d <- dedup(F(c))) [(name=c.name, drug=d)] 
Q2 = for (c <- Cand) [(mame=c.name, drugs=dedup(F(c)))] 


Next, observe that the set of all possible values for x appearing in some call to 
F(x) is finite, and can even be computed by a query. Therefore, we can write a 
closed query Qr that builds a lookup table that calculates the graph of F (or 
at least, as much of it as is needed to evaluate the queries) as follows: 


Q_F = dedup(for (x <- Cand, y <- F(x)) [(in=x,out=y))] 


Notice that the use of deduplication here is really essential to define Qp correctly: 
if we did not deduplicate then there would be repeated tuples in Qp, leading to 
incorrect results later. If we inline and simplify F(x) in the above query, we get 
the following: 


Q_F’ = dedup(for (x <- Cand, y <- Pres, z <- Drug) 
where (x.cid == y.cid && y.did = z.did) 
[(in=x, out=z.drug)]) 


Finally we may replace the call to F(x) in Q with a lookup to Qp, as follows: 


Qi’ = for (c <- Cand, f <- Q_F’) where (c == f.in) 
[(name=c.name, drug=f.out)] 


This expression may now be translated directly to SQL, because the argument 
to dedup is now closed: 


SELECT c.name,f.drug 
FROM Cand c, (SELECT DISTINCT x.name,x.cid,z.drug 
FROM Cand x, Pres y, Drug z 
WHERE x.cid = y.cid AND y.did = z.did) f 
WHERE c.cid = f.cid AND c.name = f.name 
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Q2 


JRB | (JRB,46) 


(DJT,45) | hydrochloroquine DJT | {hydrochloroquine, adderall} 


adderall JRB {caffeine} 


caffeine 


Fig. 2. Intermediate results of Q21, Q22 and result of Qe. 


Although this query looks a bit more complex than the one given earlier using 
LATERAL, it can be optimized more effectively, for example PostgreSQL generates 
a query plan that uses a hash join, giving quasi-linear complexity. 

On the other hand, to deal with Q2, we refactor it into two closed, flat queries 
Q21, Q22 and an expression Q} that builds the nested result from their flat results 
(illustrated in Figure 2): 


Q_21 = for (x <- Cand) [(mame=x.name, drugs=x)] 
Q_22 = QF 
Q2? = for (x <- Q21) 


[ (name=x.name, 
drugs=for (y <- Q_22) where(x.drugs == y.in) [y.out])] 


Notice that in Q2; we replaced the call to F with the argument xz, while Q22 
is just Qpr again. The final expression Q builds the nested result (in the host 
language’s memory) by traversing Q21 and computing the set value of each cs 
field by looking up the appropriate values from Q22. Thus, the original query 
result can be computed by first evaluating Q21 and Q22 on the database, and 
then evaluating the final stitching query expression in-memory. (In practice, as 
discussed in Cheney et al. [8], it is important for performance to use a more 
sophisticated stitching algorithm than the above naive nested loop, but in this 
paper we are primarily concerned with the correctness of the transformation.) 

The above examples are a bit simplistic, but illustrate the key idea of query 
lifting. In the rest of this paper we place this approach on a solid foundation, 
and (partially inspired by Gibbons et al. [20]), to help clarify the reasoning we 
extend the calculus with a type of tabulated functions or graphs F 3 {r}, with 
graph abstraction introduction form G(—;—) and graph application M ® (X). In 
our running example we could define Qr = G(x + R; F(ax)), and we would use 
the application operation M ® (È) to extract the set of elements corresponding 
to x in Qpr. We will also consider tabular functions that return multisets rather 
than sets, in order to deal with queries that return nested multisets. 


3 Background 


We recap the main points from [42], which introduced a calculus 
NRC, (Set, Bag) with the following syntax: 
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b 

x |t| cM) | @=M)| M| àr M| MN 
0 | {M} | MUN | UME} 

U | (MS | MYN | M-N | WVIeS 

ÔM |M | M wherest N | M where, N 
emptyset (M) | empty pag( M) 

Generators O ::= x4 M 

We distinguish between (local) variables x and (global) table names t, and 
assume standard primitive types b and primitive operations c(M) including re- 
spectively Booleans B and equality at every base type. The syntax for records 
and record projection (¢ = M}, M., and for lambda-abstraction and application 
A.M, M N is standard; as usual, let-binding is definable. Set operations include 
empty set 9, singleton construction {M}, union M U N, one-armed conditional 
M where, N, emptiness test empty «( M), and comprehension U{/ | O}, 
where O is a sequence of generators x «+ M. Similarly, multiset operations in- 
clude empty bag U, singleton (1/5, bag union M w N, bag difference M — N, 
conditional M whereyag N, emptiness test empty,,,(//). The syntax is com- 
pleted by duplicate elimination 6M (converting a bag M into a set with the 
same object type) and promotion iM (which produces the bag containing all 
the elements of the set M, with multiplicity 1). 

The one-way conditional operations M where, N and M wherepag N 
evaluate Boolean test N, and return collection M if N is true, otherwise the 
empty set/bag; two-way conditionals can supported without problems. Other 
set operations, such as intersection, membership, subset, and equality are also 
definable, as are bag operations such as intersection [4,33]. Also, we may define 
empty,,,(//) as empty,..(6(/7)) and M whereset N as 6((M) wherepag N), 
but we prefer to include these constructs as primitives for symmetry. Generally, 
we will allow ourselves to write M where N and empty(M) without subscripts 
if the collection kind of these operations is irrelevant or made clear by the context. 
We freely use syntax for unlabeled tuples (7), M.i and tuple types @ and 
consider them to be syntactic sugar for labeled records. 

The typing rules for the calculus are standard and provided in the full version 
of this paper [44]. For the purposes of this discussion, we will highlight two 
features of the type system. The first is that the calculus used here differs from 
our previous work by using constants and table names, whose types are described 
by a fixed signature X: 


Dl) =% >b (PEM: ciiz, E(t) =2: 
TH e(M):r Pet: UE: BS 


As usual, a typing judgment [+ M : o states that a term M is well-typed 
of type g, assuming that its free variables have the types declared in the typing 
context I’ = x1 : 01,...,2% : Opg. For the two rules above, note in particular that 
the primitive functions c can only take inputs of base type and produce results 
at base type, and table constants t are always multisets of records where the 
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fields are of base type. We refer to a type of the form (2: 6) as flat; if o is flat, 
we refer to {o} and (a3 as flat collection types. 

The second is that our type system uses an approach à la Church, meaning 
that variable abstractions (in lambdas/comprehensions), empty sets and empty 
bags are annotated with their type in order to ensure the uniqueness of typing. 


Lemma 1. In NRC)(Set, Bag), if TH M:o and + M:7, theno=rT. 


In the context of a larger language implementation, most of these type anno- 
tations can be elided and inferred by type inference. We have chosen to dispense 
with these details in the main body of this paper to avoid unnecessary syntactic 
cluttering. 

We will use a largely standard denotational semantics for NRC, (Set, Bag), 
in which sets and multisets are modeled as finitely-supported functions from 
their element types to Boolean values {0,1} or natural numbers respectively. 
This approach follows the so-called K-relation semantics for queries [23,18] as 
used for example in the HoTTSQL formalization [10]. The full typing rules and 
semantics are included in the full version of this paper [44]. 

NRC, (Set, Bag) subsumes previous systems including NRC [4,55], BOL [33] 
and NRC) [11,8]. In this paper, we restrict our attention to queries in which 
collection types taking part in 6, ¿ or bag difference contain only flat records. 
There are various reasons for excluding function types from these operators: for 
starters, any concrete implementation that used function types in these positions 
would need to decide the equality of functions; secondly, our rewrite system can 
ensure that a term whose type does not contain function types has a normal form 
without lambda abstractions and applications only if any 6, ¿, or bag difference 
used in that term are applied to first-order collections. We thus want to exclude 


terms such as: 
lle US (2Sla + u({Ayz.y} U {Ayz.z})5 


which do not have an SQL representation despite having a flat collection type. 

In order to obtain simpler normal forms, in which comprehensions only ref- 
erence generators with a flat collection type, we also disallow nested collections 
within 6, 1, and bag difference. We believe this is without loss of generality be- 
cause of Libkin and Wong’s results showing that allowing such operations at 
nested types does not add expressiveness to BQL. 

We have extended Wong’s normalizing rewrite rule system, so as to simplify 
queries to a form that is close to SQL, with no intermediate nested collections. 
Since our calculus is more liberal than Wong’s, allowing queries to be defined by 
mixing sets and bags and also using bag difference, we have added non-standard 
rules to take care of unwanted situations. In particular, we use the following 
constrained eta-expansions for comprehensions: 


(ham — N)|0} + UHO, z + ô(M — N)} 
lumos + lzi], z — «MS 
Lu — Nos ~ Witzsle, z M- NS 
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General normal forms M := X | (€=M)|Q|R 

= al | c(X) | empty,,,(Q*) | empty,,,,(R") 
2=U 

= U{{M} wherese X|x + P} 

n= ôt | (RI — R3) 

== H D 

n= Hll M $ wherepag X |x + CS 

n=t | cQ* | Ri- R3 


Base type terms 


Set normal forms 


Bag normal forms 


awana »& 


Fig. 3. Nested relational normal forms. 


The rationale of these rules is that in order to achieve, for comprehensions, 
a form that can be easily translated to an SQL select query, we need to move all 
the syntactic forms that are blocking to most normalization rules (i.e. promotion 
and bag difference) from the head of the comprehension to a generator. In order 
for this strategy to work out, we also need to know that the type of these 
subexpressions is flat, as we previously mentioned. 

In Figure 3 we show the grammar for the normal forms for terms of nested 
relational types, i.e. types of the following form: 


o=b| (0:0) | {o} | US 


For ease of presentation, the grammar actually describes a “standardized” 
version of the normal forms in which: 


— ) is represented as the trivial union UJ fol where Č is the empty sequence; 
U has a similar representation using a trivial disjoint union; 

— comprehensions without a guard are considered to be the same as those with 
a trivial true guard: 


LJ{{12}|0} = H{M} where true | 6} 


— singletons that do not appear as the head of a comprehension are represented 
as trivial comprehensions: 


{m} =| Ji{m} | } 


Each normal form M can be either a term of base type X, a tuple (¢ = M), 
a set Q, or a bag R. The normal forms of sets and bags are rather similar, both 
being defined as unions of comprehensions with a singleton head. The gener- 
ators for set comprehensions F include deduplicated tables and deduplicated 
bag differences; the generators for bag comprehensions G must be either tables, 
promoted set queries, or bag differences. 

The non-terminals used as the arguments of emptiness tests, promotion, and 
bag difference have been marked with a star to emphasize the fact that they 
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(Ø)5! = SELECT 42 WHERE 0 = 1 (OU)! = SELECT 42 WHERE 0 = 1 
(x.£)54 =e (c(¥)) 54" = (oI (X) 1) 
(= X) = (X1) AS £1,..., (Xn)! AS Ln 
(empty.a(Q*))5! = NOT EXISTS (Q*)* (empty,,,(R*))* = NOT EXISTS (R*)** 
(Qł U Q3)! = (Q*)s! UNION (Q3) (Ri w Rž) = (R¥)5! UNION ALL (Rž) 
(t)! = SELECT » FROM t (R? — RZ)! = (R$)! EXCEPT ALL (R4) 
(ôt)! = SELECT DISTINCT * FROM t U(Q*)) = (Q*) 
(5(R*¥ — RS))8! = SELECT DISTINCT * FROM ((Rž)° EXCEPT ALL (R3)%'s) r 
ala ((F)) x (x closed) 
cm le fe ((F)*') æ (otherwise) 
sàl a ((G)") x (a closed) 
meca ee) ((G)*") x (otherwise) 


(U{{M*} wheres X | z F'})"! = SELECT DISTINCT (M*)*! FROM (z + È)“ WHERE (X)5*! 
(WHL M* S wheres, X | z — Q5)" = SELECT (M*)*! FROM (x + G)“ WHERE (X) 


Fig. 4. Translation to SQL 


must have a flat collection type. The corresponding grammar can be obtained 
from the grammar for nested normal forms by replacing the rule for M with the 


following: 
M* := (= X) 


Normalized queries can be translated to SQL as shown in Figure 4 as long 
as they have a flat collection type. The translation uses SELECT DISTINCT and 
UNION where a set semantics is needed, and SELECT, UNION ALL and EXCEPT ALL 
in the case of bag semantics. Note that promotion expressions ¿Q* are translated 
simply by translating Q*, because in SQL there is no type distinction between 
set and multiset queries: all query results are multisets, and sets are considered 
to be multisets having no duplicates. 

The other main complication in this translation is in handling generators 
x + F, x 4+ G where F or G may be a non-closed expression 1(Q*), R} — R3, or 
6(R} — Rž) containing references to other locally-bound variables. To deal with 
the resulting lateral variable references, we add the LATERAL keyword to such 
queries. As explained earlier, the use of LATERAL can be problematic and we will 
return to this issue in Section 5. 


Remark 1 (Record flattening). The above translations handle queries that take 


flat tables as input and produce flat results (collections of flat records (2: 8). It 
is straightforward to support queries that return nested records (i.e. records con- 
taining other records, but not collections). For example, a query 
M : ((b1, (b2,63))§ can be handled by defining both directions of the obvious 
isomorphism N : ((b1, (b2, bs)) 5 = l (b1, b2, bs) : N~1, normalizing the flat query 
N o M, evaluating the corresponding SQL, and applying the inverse N~! to the 
results. Such record flattening is described in detail by Cheney et al. [9] and is 
implemented in Links, so we will use it from now on without further discussion. 
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(I, 2-1: Git F L; : {oi})i=1 a n (I; Ti-1:0i-1- Li: Toi} )i=1 sai n 
T,z òH M: {r} T, z: M:IrS 
rEg“ E L; M):? 3 {7} rEg (ELM): ? r5 


TrM: ? 30r5 
PEN: @ 33lrS 
TPEM-N:¢2 3073 


TEM:¢ 37 (CE Ni: 04): 
rEM@(N):7 


PEM: @ {r} PEM:@ 3ir3 
PEN: ¢ {r} PEN: @ 3055 
PEMUN:@ 3 {r} CFEMUYN:@? 3lr5 
CPEM:@ 3073 DEM:¢ {r} 
PH6M: ¢ 3 {r} PEiM: ¢ 3073 


Fig. 5. NRCg additional typing rules. 


4 A relational calculus of tabular functions 


We now introduce VRCg, an extension of the calculus NRC, (Set, Bag) provid- 
ing a new type of finite tabular function graphs (in the remainder of this paper, 
also called simply “graphs”; they are similar to the finite maps and tables of 
Gibbons et al. [20]). The syntax of NRCg is defined as follows: 

Types otis) | Par 

Terms M,N ::=--- | G%(O;N) | G*8(0; N) | M@(N) 

Semantically, the type of graphs @ -3 7 will be interpreted as the set of 
finite functions from sequences of values of type @ to values in T: such functions 
can return non-trivial values only for a finite subset of their input type. In our 
settings, we will require the output type of graphs to be a collection type (i.e. 
T shall be either {7’} or (7’5 for some 7’), and we will use Ø or U as the trivial 
value. The typing rules involving graphs are shown in Figure 5. 

Graphs are created using the graph abstraction operations G%*(O; N) and 
g™8(O; N), where O is a sequence of generators in the form x + M; the dual 
operation of graph application is denoted by M ® (N). An expression of the 
form G*(x + M; N) is used to construct a (finite) tabular function mapping 
each sequence of values R1,..., Rn in the sets Mı, ..., Mn to the set N F/2] ; 
If each M; has type {o;} and N has type {7}, then the graph has type ? -3 {7}. 
Similarly, if N has type (rJ, G>*8(a« — M;N) has type & 3 (rj. The terms 
M,,...,M, constitute the (finite) domain of this graph. When the kind of graph 
application (set-based or bag-based) is clear from the context or unimportant, 
we will allow ourselves to write G(—; —) instead of Gs*t(—; —) or G>?8(—; —). 
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A graph G of type @ 37 can be applied to a sequence of terms Nj,...,Nn 
of type o1,...,@n to obtain a term of type T. If G = G(x + L; M), then we will 
want the semantics of G(x — L; M) ® (N) to be the same as that of M [S], 
provided that each of the N; is in the corresponding element of the domain of 
the graph. The typing rule does not enforce this requirement and if any of the 
N; is not an element of L;i, the graph application will evaluate to an empty set 
or bag (depending on rT). 

Graphs can also be merged by union, using U or W depending on their output 
collection kind. Furthermore, graphs that return bags can be subtracted from 
one another using bag difference; the deduplication and promotion operations 
also extend to graphs in the obvious way. 


Lemma 2. In NRCcg, 0} M:o and I+ M:r, theno =r. 


Whenever M is well typed and its typing environment is made clear by the 
context, we will allow ourselves to write ty( M) for the type of M. Furthermore, 
given a sequence of generators O = zı + Li,...£n < Ln, such that for i = 
1,...,n we have 71: 01,...,%j-1 : oi-1 F Li : oi, we will write ty(O) to denote 
the associated typing context: 


ty(O) := T1 : 01,- -Ln | On 


4.1 Semantics and translation to NRC, (Set, Bag) 


The semantics of VRC)(Set, Bag) is extended to NRCg as follows: 


gc E L; M)| p(t») 

= (A El olei w,... 0-1 ui—a}us) A [M] oe ù] 
ghe(x— L; M)| p(t, v) 

= (A; [Eil ole: = u,. i > Ui—a}us) x [M] ple = to 
|m @ (N)] pe = [MI p (INT 2v) 


In this definition, graph abstractions are interpreted as collections of pairs of 
values (a, v) where the VÜ represent the input and v the corresponding output 


of the graph; consequently, the semantics of a graph G+ (x + L; M) states that 
the multiplicity of t, v) is equal to the multiplicity of v in the semantics of M 
(where each x; is mapped to u;) if each u; is in the semantics of Li, and zero 
otherwise. The semantics of bag graph abstractions is similar, with x substituted 
for A to allow multiplicities greater than one in the graph output. 

For graph applications M ® (Ñ), the multiplicity of v is obtained as the mul- 


tiplicity of ([N] p, v) in the semantics of M. The semantics of set and bag union, 
bag difference, bag deduplication, and set promotion, as defined in 
NRC, (Set, Bag), are extended to graphs and remain otherwise unchanged in 
NRCg. 
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In fact (as noted for example by Gibbons et al. [20]), the graph constructs 
of NRCg are just a notational convenience: we can translate MRCg back to 
NRC (Set, Bag) by translating types F 3 {r} and F 3 (7S to {(@,7)} and 
((@,7)5 respectively, and the term constructs are rewritten as follows: 


gc — LM Uta} lee Ly M} 


) ~ 
gh (T E È; M) ~» EUR, yS | @ G 
Me n ~ Utta} ans N|, y) <M} (M:@ 3 {r}) 
@ (N) ~ llys wheres? = N | (2,y) M5 (M:? 30S) 


5 Delateralization 


As explained at the end of section 3, if a subexpression of the form (N) or 
Nı — Nə contains free variables introduced by other generators in the query (i.e. 
not globally-scoped table variables), such queries cannot be translated directly 
to SQL, unless the SQL:1999 LATERAL keyword is used. 

More precisely, we can give the following definition of lateral variable occur- 
rence. 


Definition 1. Given a query containing a comprehension J{M | O,x «+ N, O'} 
or WM | O,x — N,O'S as a subterm, we say that x occurs laterally in O’ if, 
and only if, there is a binding y + N’ in O' such that x € FV(N’). 


Since LATERAL is not implemented on all databases, and is sometimes imple- 
mented inefficiently, we would still like to avoid it. In this section we show how 
lateral occurrences can be eliminated even in the presence of bag promotion and 
bag difference, by means of a process we call delateralization. 

Using the NRCg constructs, we can delateralize simple cases of deduplication 
or multiset difference as follows: 


WM |x 4+ N, y + (P) ~ WM |x 4+ N, y + (G(x + ôN; P) @ x} 
eM |x + N, y + Pi — Pf ~ 

WlM |x 4+ N, y + (G(x + ôN; Pi) — G(x + ôN; P2)) @ x} 
U{M | x & Nyy bi — Po)} ~ 

U{M | xz + N, y + 6(G(a + N; Pi) — G(x + N; P2)) ® x} 


It is necessary to deduplicate N in the first two rules to ensure that the results 
correctly represent finite maps from the distinct elements of N to multisets of 
corresponding elements of P. (In any case, N needs to be deduplicated in order 
to be used as a set in G(x + ON;_)). 

Given a query expression in normal form, the above rules together with 
standard equivalences (such as commutativity of independent generators) can 
be used to delateralize it: that is, remove all occurrences of free variables in 
subexpressions of the form (N), Mı — Mo, or 6(My — Mə). 


Theorem 1. If M is a flat query in normal form, then there exists M’ equiva- 
lent to M with no lateral variable occurrences. 
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The proof of correctness of the basic delateralization rules and the above cor- 
rectness theorem are in the full version of this paper [44]. 
To illustrate some subtleties of the translation, here is a trickier example: 


Li | aH N,y = Q-uP)$ 


where Q, P both depend on x. We proceed from the outside in, first delateralizing 
the difference: 


iM | x+ Ny & (G(x + 8N); Q) — Gla — 6(N);(P))) @ 23 


Note that this still contains a lateral subquery, namely (P) depends on x. After 
translating back to NRC,(Set, Bag), and delateralizing .(P), the query normal- 
izes to: 


Qı = U{(2z, z) | x € 6(N), z + P} 
Q2 = (Wl (x, 2) | x E€ (N), z — QS) — (Wl (x, z) | £ € 18(N), (2, z) — (Q1), 2 = 2’S) 
WM |x & N, (x,y) = Qo,2=2'$ 


6 Query lifting and shredding 


In the previous sections, we have discussed how to translate queries with flat 
collection input and output to SQL. The shredding technique, introduced in [8], 
can be used to convert queries with nested output (but flat input) to multiple flat 
queries that can be independently evaluated on an SQL database, then stitched 
together to obtain the required nested result. This section provides an improved 
version of shredding, extended to a more liberal setting mixing sets and bags and 
allowing bag difference operations, and described using the graph operations we 
have introduced, allowing an easier understanding of the shredding process. 

We introduce, in Figure 6, a shredding judgment to denote the process by 
which, given a normalized NRC, (Set, Bag) query, each of its subqueries having 
a nested collection type is lifted (in a manner analogous to lambda-lifting [30]) to 
an independent graph query: more specifically, shredding will produce a shred- 
ding environment (denoted by &,W,...), which is a finite map associating special 
graph variables p, Y to NRCg terms: 


—> 
@,V,...:=|[po M] 


The shredding judgment has the following form: 


5&OtM=sM|Y 


where the — symbol separates the input (to the left) from the output (to the 
right). The normalized WRC (Set, Bag) term M is the query that is being con- 
sidered for shredding; M may contain free variables declared in O, which must be 
a sequence of NRC, (Set, Bag) set comprehension bindings. © is initially empty, 
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X is a base term (Pi1;O0F Mi > Mi | ®i)i=1,....n 


; t= y 
P,O0-XeX | $);0+ = M) U= M) | n 


p ¢ dom(,,) p ¢ dom(,,) 
(Bi-1;O F Ci & Yi @dom(O) | Pi)i=1,...,.n (i-1;O F Di & pi ®dom(O@) | Pi)i=1,...,.n 
b:0FUC +e @ dom(8) By; H WD > p @ dom(@) 
| (Pn \ Yie = U Sn (4)] | (n \ le WH 8n (4)] 
p ¢ dom(v) 


0:0, - FE MBM |Y 


®;0+ U{{M} where X|x + È} => p ® dom(O) 
| Viy => G(0;U{{M} where X|z = F})] 


p ¢ dom(¥) 


br GF Me M | w 


Do; OF Hll MS where X|x + ren => y ® dom(@) 
| Vip GO;WLUMS where X|z © Ġ5)] 


Ca Skewes = VPA lwo Nes o¢ 7] 


Fig. 6. Shredding rules. 


but during shredding it is extended with parts of the input that have already 
been processed. Similarly, the input shredding environment @ is initially empty, 
but will grow during shredding to collect shredded queries that have already 
been generated. It is crucial, for our algorithm to work, that M be in the form 
previously described in Figure 3, as this allows us to make assumptions on its 
shape: in describing the judgment rules, we will use the same metavariables as 
are used in that grammar. 

The output of shredding consists of a shredded term M and an output shred- 
ding environment WY. W extends ® with the new queries obtained by shredding 
M; M is an output NRC g query obtained from M by lifting its collection typed 
subqueries to independent queries defined in Y. 

The rules for the shredding judgment operate as follows: the first rule ex- 
presses the fact that a normalized base term X does not contain subexpressions 
with nested collection type, therefore it can be shredded to itself, leaving the 
shredding environment ® unchanged; in the case of tuples, we perform shred- 
ding pointwise on each field, connecting the input and output shredding envi- 
ronments in a pipeline, and finally combining together the shredded subterms in 
the obvious way. 

The shredding of collection terms (i.e. unions and comprehensions) is per- 
formed by means of query lifting: we turn the collection into a globally defined 
(graph) query, which will be associated to a fresh name y and instantiated to the 
local comprehension context by graph application. This operation is reminiscent 
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heer PEM:@ 37 p ¢ dom(L) 
gaa + pH M|: (T,p: F 37) 


Fig. 7. Typing rules for shredding environments. 


of the lambda lifting and closure conversion techniques used in the implementa- 
tion of functional languages to convert local function definitions into global ones. 
Thus, when shredding a collection, besides processing its subterms recursively, 
we will need to extend the output shredding environment with a definition for 
the new global graph y. In the interesting case of comprehensions, ¢ is defined by 
graph-abstracting over the comprehension context O; notice that, since we are 
only shredding normalized terms, we know that they have a certain shape and, 
in particular, the judgment for bag comprehensions must ensure that generators 
be converted into sets. 

The shredding of set and bag unions is performed by recursion on the sub- 
terms, using the same plumbing technique we employed for tuples; additionally, 
we optimize the output shredding environment by removing the graph queries 
w resulting from recursion, since they are absorbed into the new graph y. 

Notice that since the comprehension generators of our normalized queries 
must have a flat collection type, they do not need to be processed recursively. 
Furthermore, since our normal forms ensure that promotion and bag difference 
terms can only appear as comprehension generators, we do not need to provide 
rules for these cases. 

The shredding environments used by the shredding judgment must be well 
typed, in the sense described by the rules of Figure 7: the judgment F ® : I’ means 
that the graph variables of ® are mapped to terms whose type is described by 
I’, Whenever we add a mapping [p => M] to ®, we must make sure that M is 
well typed (of graph type) in the typing environment I associated to ©. 

If- ®: I’, we will write ty(®) to refer to the typing environment I" associated 
to ®. The following result states that shredding preserves well-typedness: 


Theorem 2. Let © be well-typed and ty(@) M : 0o. IFO} MBM | &, then: 


— ® is well-typed | 
— ty(®),ty(O)F M:o 


We now intend to prove the correctness of shredding: first, we state a lemma 
which we can use to simplify certain expressions involving the semantics of graph 
application: 


Definition 2. Let O be a closed, well-typed sequence of generators. A substitu- 
tion p is a model of O (notation: pF O) if, and only if, for all x € dom(@), we 
have [O(x))] p(x) > 0. 


Lemma 3. 1. [U @) ® ()| p= Va Ie ® (N)| p 
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2. If pF O, then for all M we have [G(O; M) ® (dom(0))] p = [M] p. 


To state the correctness of shredding, we need the following notion of shred- 
ding environment substitution. 


Definition 3. For every well-typed shredding environment ®, the substitution of 
® into an NRCg term M (notation: M®) is defined as the operation replacing 
within M every free variable p € dom(®) with (@(y))® (i.e.: the value assigned 
by B® to ọ, after recursively substituting ®). 


We can easily show that the above definition is well posed for well-typed ®. 

We now show that shredding preserves the semantics of the input term, in the 
sense that the term obtained by substituting the output shredding environment 
into the output term is equivalent to the input. 


Theorem 3 (Correctness of shredding). Let O be well-typed and ty(Q) H 
M:o.If@0FMBM | W, then, for all pF O, we have [M] p = [me] p- 


Proof. By induction on the shredding judgment. We comment two representative 
cases: 


— in the set comprehension case, we want to prove 


[Uta where X|z + F 1 pv= 
[le © (dom(O)) lp > UlG@:UCAT} where xir € F})}I] pv 


where pF O. We rewrite the Ihs as follows: 


[Ut where X|z < F|} pv 

= Va (IM] pn = v) A (LX] pn) A (LF) pi-1 us))é=1,....n 
where p; = pļzı > W,...,0; > ul E O,xı + Fi,... £; 4} F; for all 
i=1,...,n, and u; s.t. [Fi] pi-1u;. By the definition of substitution and by 
Lemma 3, we rewrite the rhs: 


= ](G(0; {411} where X|z — F})) @ (dom(o))| pv 


0. Therefore, we only need to consider those VX such that Pn F O, £ + 
Then, to prove the thesis, we only need to show: 


[M] pn = [79] pn 


which follows by induction hypothesis, for p, F O, x£ < I : 
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— in the set union case, we want to prove 
[UC] v= [ee domow Des URW] oe 
where pF O. We rewrite the Ihs as follows: 
[Uc] p v= \V [Cile v 
By the definition of substitution and by Lemma 3, we rewrite the rhs: 


[iv e (domo) E Ple = UFO] vw 
= [(UEH)% @ (dom(0))|| p v 
= V; [UY (i))¥ @ (dom(6))] p v 


By induction hypothesis and unfolding of definitions, we know for all 7: 


[Ci] = |v: © (dom(6)))¥] p = [Ew @ dom(})] v 


which proves the thesis. 


6.1 Reflecting shredded queries into NRC) (Set, Bag) 


The output of the shredding judgment is a stratified version of the input term, 
where each element of the output shredding environment provides a layer of col- 
lection nesting; furthermore, the output is ordered so that each element of the 
shredding environment only references graph variables defined to its left, which 
is convenient for evaluation. Our goal is to evaluate each shredded item as an 
independent query: however, these items are not immediately convertible to flat 
queries, partly because their type is still nested, and also due to the presence of 
graph operations introduced during shredding. We thus need to provide a trans- 
lation operation capable of converting the output of shredding into independent 
flat terms of NRC,(Set, Bag). This translation uses two main ingredients: 


— an index function to convert graph variable references to a flat type I of 
indices, such that ¢, X are recoverable from index(@, P); 
— a technique to express graphs as standard WRC (Set, Bag) relations. 


The resulting translation, denoted by |-|, is shown in in Figure 8. Let us 
remark that the translation need be defined only for term forms that can be 
produced as the output of shredding: this allows us, for instance, not to consider 
terms such as iM or M — N, which can only appear as part of flat generators 
of comprehensions or graphs. 

We discuss briefly the interesting cases of the definition of the flattening 
translation. Base expressions X are expressible in MRC, (Set, Bag), therefore 
they can be mapped to themselves (this is also true for empty(M), since nor- 
malization ensures that the type of M be a flat collection). Graph applications 
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[x] =x | @=™) | = @=[M)) 


[Ue] -U WB] =Wint 
ly ® (#)| = index(y, T ) 


[UHM} where Xs = P| =[J{H{LM]} where X|z © F} 
|Wi275 where xie G5| 
[eE É; M)| 
[oe F; M) | 


— H Laz} 5 where X|x + CS 
= Ute wire È, y © Ly) 
= Wile, ylz = LF, y — |M]S 


Fig. 8. Flattening embedding of shredded queries into NRC) (Set, Bag). 


y ® (a’), as we said, are translated with the help of an index abstract operation: 
this is where the primary purpose of the translation is accomplished, by flatten- 
ing a collection type to the flat type I, making it possible for a shredded query to 
be converted to SQL; although we do not specify the concrete implementation of 
index, it is worth noting that it must store the arguments of the graph applica- 
tion along with the (quoted) name of the graph variable y. Tuples, unions, and 
comprehensions only require a recursive translation of their subterms: however 
the generators of comprehensions must have a flat collection type, so no recursion 
is needed there. Finally, we translate graphs as collections of the pairs obtained 
by associating elements of the domain of the graph to the corresponding output; 
it is simple to come up with a comprehension term building such a collection: 
set-valued graphs are translated using set comprehension, while bag-valued ones 
use bag comprehension (this also means that in the latter case the generators 
for the domain of the graph, which are set-typed, must be wrapped in a 1). 

We can prove that the flattening embedding produces flat-typed terms, as 
expected. 


Definition 4. A well-typed set comprehension generator O is flat-typed if, and 
only if, for all x € dom(@), there exists a flat type o such that ty(O(x)) = {0}. 

A well-typed shredding environment © is flat-typed if, and only if, for all 
p E€ dom(®), we have that ty(|®(y)]) is a flat collection type. 


Lemma 4. Suppose $; O F M => M | W, where ® and O are flat-typed. Then, 
M and ¥ are also flat-typed. 


It is important to note that the composition of shredding and |-| does not 
produce normalized WRC (Set, Bag) terms: when we shred a comprehension, we 
add to the output shredding environment a graph returning a comprehension, 
and when we translate this to WRC)(Set, Bag) we get two nested comprehen- 
sions: 


Ga — st JCMS ly — 0Q"S)| = Wee, 2)le — edt, 2 — Lede] Sly — cQ"S5 
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(xX :WNEaX (if X is not an index) 
= m Aa a a 
((é = Ur (€:7))5 > (= (N:7)2) 
(C= N) ti : 1)E £ (Nis NE 
(index(y, V) : {1HE £ UHlp-2 : IE} | p = S(y),p.1 = (V)} 
(inden(y, V) : USE £ YUlp-2 : TIES | p = (4), p-1 = (V)5 


Fig. 9. The stitching function. 


In fact, not only is this term not in normal form, but it may even contain, within 
Q*, a lateral reference to x; thus, after a flattening translation, we will always 
require the resulting queries to be renormalized and, if needed, delateralized. 

Let norm denote NRC, (Set, Bag) normalization, and S denote the evalua- 
tion of relational normal forms: we define the shredded value set = corresponding 
to a shredding environment @ as follows: 


E £ {pH S(norm(|M]))|[p M] € 8} 


The evaluation S is ordinarily performed by a DBMS after converting the 
NRC (Set, Bag) query to SQL, as described in Section 5. The result of this 
evaluation is reflected in a programming language such as Links as a list of 
records. 


6.2 The stitching function 


Given a NRC(Set, Bag) term with nested collections, we have first shredded it, 
obtaining a shredded NRCg term M and a shredding environment ® containing 
NRCg graphs; then we have used a flattening embedding to reflect both M and 
® back into the flat fragment of WRC)(Set, Bag); next we used normalization 
and DBMS evaluation to convert the shredding environment into a shredded 


value set =. As the last step to evaluate M : 7, we need to combine [ar | and 


= together to reconstruct the correct nested value (la| : T)S by stitching 
together partial flat values. 

The stitching function is shown in Figure 9: its job is to visit all the compo- 
nents of tuples and collections, ignoring atomic values other than indices along 
the way. The real work is performed when an index(y, V ) is found: conceptu- 
ally, the index should be replaced by the result of the evaluation of y ® (V). 
Remember that = contains the result of the evaluation of the graph function y 
after translation to NRC,(Set, Bag), i.e. a collection of pairs associating each 
input of p to the corresponding output: then, to obtain the desired result, we 
can take =(y), filter all the pairs p whose first component is (V), and return 
the second component of p after a recursive stitching. Finally, observe that we 
track the result type argument in order to disambiguate whether to construct a 
set or multiset when we encounter an index. 
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Theorem 4 (Correctness of stitching). Let O be well-typed and ty(O) F M : 
a. Let ® be well-typed, and suppose P; OF Mt M | Y. Let E be the result of 


evaluating the flattened queries in VW as above. Then [me] p= kK [az] : z| p. 


The full correctness result follows by combining the Theorems 3 and 4. 


Corollary 1. For all M such thatt M:7, suppose M & M' |W, and let Z 
be the shredded value set obtained by evaluating the flattened queries in W. Then 


[M] = kK [a| : z| . 


7 Related work 


Work on language-integrated query and comprehension syntax has taken place 
over several decades in both the database and programming language commu- 
nities. We discuss the most closely related work below. 


Comprehensions, normalization and language integration The database commu- 
nity had already begun in the late 1980s to explore proposals for so-called non- 
first-normal-form relations in which collections could be nested inside other col- 
lections [46], but following Trinder and Wadler’s initial work connecting database 
queries with monadic comprehensions [50], query languages based on these foun- 
dations were studied extensively, particularly by Buneman et al. [4,3]. For our 
purposes, Wong’s work on query normalization and translation to SQL [55] is 
the most important landmark; this work provided the basis for practical imple- 
mentations such as Kleisli and later Links. Almost as important is the later work 
by Libkin and Wong [33], studying the questions of expressiveness of bag query 
languages via a language BOC that extended basic NRC with deduplication and 
bag difference operators. They related this language to NRC with set semantics 
extended with aggregation (count/sum) operations, but did not directly address 
the question of normalizing and translating BQL queries to SQL. Grust and 
Scholl [28] were early advocates of the use of comprehensions mixing set, bag 
and other monadic collections for query rewriting and optimization, but did not 
study normalization or translatability properties. 

Although comprehension-based queries began to be used in general-purpose 
programming languages with the advent of Microsoft LINQ [36] and Links [12], 
Cooper [11] made the next important foundational contribution by extending 
Wong’s normalization result to queries containing higher-order functions and 
showing that an effect system could be used to safely compose queries using 
higher-order functions even in an ambient language with side-effects and recur- 
sive functions that cannot be used in queries. This work provided the basis for 
subsequent development of language-integrated query in Links [34] and was later 
adapted for use in F# [7], Scala [41], and by Kiselyov et al. [48] in the OCaml 
library QuUEA. However, on revisiting Cooper’s proof to extend it to heteroge- 
neous queries, we found a subtle gap in the proof, which was corrected in a recent 
paper [43]; the original result was correct. As a result, in this paper we focus on 
first-order fragments of these languages without loss of generality. 
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Giorgidze et al. [22] have shown how to support non-recursive datatypes (i.e. 
sums) and Grust and Ulrich [29] built on this to show how to support function 
types in query results using defunctionalization [29]. We considered using sums 
to support a defunctionalization-style strategy for query lifting, but Giorgidze 
et al. [22] map sum types to nested collections, which makes their approach 
unsuitable to our setting. Wong’s original normalization result also considered 
sum types, but to the best of our knowledge normalization for NRC, (Set, Bag) 
extended with sum types has not yet been proved. 

Recent work by Suzuki et al. [48] have outlined further extensions to lan- 
guage-integrated query in the QUEA system, which is based on finally-tagless 
syntax [6] and employs Wong’s and Cooper’s rewrite rules; Katsushima and Kise- 
lyov’s subsequent short paper [31] outlined extensions to handling ordering and 
grouping. Kiselyov and Katsushima [32] present an extension to QUEA called 
SQuR to handle ordering based on effect typing, and they provide an elegant 
translation from SQUR queries to SQL based on normalization-by-evaluation. 
Okura and Kameyama [39] outline an extension to handle SQL-style grouping 
and aggregation operators in QUEAg; however, their approach potentially gen- 
erates lateral variable occurrences inside grouping queries. These systems QUEA, 
SQuR and QUEAg consider neither heterogeneity nor nested results. 

Our adoption of tabulated functions (graphs) is inspired in part by Gibbons 
et al. [20], who provided an elegant rational reconstruction of relational algebra 
showing how standard principles for reasoning about queries arise from adjunc- 
tions. They employed types for (finite) maps and tables to show how joins can be 
implemented efficiently, and observed that such structures form a graded monad. 
We are interested in further exploring these structures and extending our work 
to cover ordering, grouping and aggregation. 


Query decorrelation and delateralization There is a large literature on query 
decorrelation, for example to remove aggregation operations from SELECT or 
WHERE clauses (see e.g. [38,5] for further discussion). Delateralization appears 
related to decorrelation, but we are aware of only a few works on this problem, 
perhaps because most DBMSs only started to support LATERAL in the last few 
years. (Microsoft SQL Server has supported similar functionality for much longer 
through a keyword APPLY.) Our delateralization technique appears most closely 
related to Neumann and Kemper’s work on query unnesting [38]. In this con- 
text, unnesting refers to removal of “dependent join” expressions in a relational 
algebraic query language; such joins appear to correspond to lateral subqueries. 
This approach is implemented in the HyPER database system, but is not ac- 
companied by a proof of correctness, nor does it handle nested query results. It 
would be interesting to formalize this approach (or others from the decorrelation 
literature) and relate it to delateralization. 


Querying nested collections Our approach to querying nested heterogeneous 
collections clearly specializes to the homogeneous cases for sets and multisets 
respectively, which have been studied separately. Van den Bussche’s work on 
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simulating queries on nested sets using flat ones [54] has also inspired subse- 
quent work on query shredding, flattening and (in this paper) lifting, though 
the simulation technique itself does not appear practical (as discussed in the 
extended version of Cheney et al. [9]). More recently, Benedikt and Pradic [1] 
presented results on representing queries on nested collections using a bounded 
number of interpretations (first-order logic formulas corresponding to definable 
flat query expressions) in the context of their work on synthesizing NRC queries 
from proofs. This approach considers set-valued MRC only, and its relationship 
to our approach should be investigated further. 

Cheney et al.’s previous work on query shredding for multiset queries [8] is 
different in several important respects. In that work we did not consider dedupli- 
cation and bag difference operations from BQL, which Libkin and Wong showed 
cannot be expressed in terms of other NRC operations. The shredding transla- 
tion was given in several stages, and while each stage is individually comprehen- 
sible, the overall approach is not easy to understand. Finally, the last stages of 
the translation relied on SQL features not present (or expressible) in the source 
language, such as ordering and the SQL:1999 ROW_NUMBER construct, to synthe- 
size uniform integer keys. Our approach, in contrast, handles set, bag, and mixed 
queries, and does not rely on any SQL:1999 features. 

In a parallel line of work, Grust et al. [26,21,51,53,52] have developed a num- 
ber of approaches to querying nested list data structures, first in the context of 
XML processing [24] and subsequently for MRC-like languages over lists. The 
earlier approach [26], named loop-lifting (not to be confused with query lifting!) 
made heavy use of SQL:1999 capabilities for numbering and indexing to decouple 
nested collections from their context, and was implemented in both Links [51] 
and earlier versions of the Database Supported Haskell library [21], both of which 
relied on an advanced query optimizer called Pathfinder [27] to optimize these 
queries. The more recent approach, implemented by Ulrich in the current version 
of DSH and described in detail in his thesis [52], is called query flattening and 
is instead based on techniques from nested data parallelism [2]. Both loop-lifting 
and query flattening are very powerful, and do not rely on an initial normaliza- 
tion stage, while supporting a rich source language with list semantics, ordering, 
grouping, aggregation, and deduplication which can in principle emulate set or 
multiset semantics. However, to the best of our knowledge no correctness proofs 
exist for either technique. We view finding correctness results for richer query 
languages as an important challenge for future work. 

Another parallel line of work started by Fegaras and Maier [15,14] considers 
heterogeneous query languages based on monoid comprehensions, with set, list, 
and bag collections as well as grouping, aggregation and ordering operations, in 
the setting of object-oriented databases, and forms the basis for complex object 
database systems such as ADB [16] and Apache MRQL [14]. However, Wong- 
style normalization results or translations from flat or nested queries to SQL are 
not known for these calculi. 


Lambda-lifting and closure conversion Since Johnsson’s original work [30], 
lambda-lifting and closure conversion have been studied extensively for func- 
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tional languages, with Minamide et al.’s typed closure conversion [37] of par- 
ticular interest in compilers employing typed intermediate languages. We plan 
to study whether known optimizations in the lambda-lifting and closure con- 
version literature offer advantages for query lifting. The immediate important 
next step is to implement our approach and compare it empirically with previ- 
ous techniques such as query shredding and query flattening. By analogy with 
lambda-lifting and closure conversion, we expect additional optimizations to be 
possible by a deeper analysis of how variables/fields are used in lifted subqueries. 
Another problem we have not resolved is how to deal with deduplication or bag 
difference at nested collection types in practice. Libkin and Wong [33] showed 
that such nesting can be eliminated from BQL queries, but their results do not 
provide a constructive algorithm for eliminating the nesting. 


8 Conclusions 


Monadic comprehensions have proved to be a remarkably durable foundation for 
database programming and language-integrated query, and has led to language 
support (LINQ for .NET, Quill for Scala) with widespread adoption. Recent 
work has demonstrated that techniques for evaluating queries over nested collec- 
tions, such as query shredding or query flattening, can offer order-of-magnitude 
speedups in database applications [19] without sacrificing declarativity or read- 
ability. However, query shredding lacks the ability to express common operations 
such as deduplication, while query flattening is more expressive but lacks a de- 
tailed proof of correctness, and both techniques are challenging to understand, 
implement, or extend. We provide the first provably correct approach to querying 
nested heterogeneous collections involving both sets and multisets. 

Our most important insight is that working in a heterogeneous language, 
with both set and multiset collection types, actually makes the problem easier, 
by making it possible to calculate finite maps representing the behavior of nested 
query subexpressions under all of the possible environments encountered at run 
time. Thus, instead of having to maintain or synthesize keys linking inner and 
outer collections, as is done in all previous approaches, we can instead use the 
values of variables in the closures of nested query expressions themselves as 
the keys. The same approach can be used to eliminate sideways information- 
passing. This is analogous to lambda-lifting or closure conversion in compilation 
of functional languages, but differs in that we lift local queries to (queries that 
compute) finite maps rather than ordinary function abstractions. We believe 
this idea may have broader applications and will next investigate its behavior in 
practice and applications to other query language features. 
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Abstract. We show how to define forward- and reverse-mode automatic 
differentiation source-code transformations or on a standard higher-order 
functional language. The transformations generate purely functional code, 
and they are principled in the sense that their definition arises from a 
categorical universal property. We give a semantic proof of correctness of 
the transformations. In their most elegant formulation, the transforma- 
tions generate code with linear types. However, we demonstrate how the 
transformations can be implemented in a standard functional language 
without sacrificing correctness. To do so, we make use of abstract data 
types to represent the required linear types, e.g. through the use of a 
basic module system. 


Keywords: automatic differentiation - program correctness - semantics. 


1 Introduction 


Automatic differentiation (AD) is a technique for transforming code that im- 
plements a function f into code that computes f’s derivative, essentially by 
using the chain rule for derivatives. Due to its efficiency and numerical stabil- 
ity, AD is the technique of choice whenever derivatives need to be computed 
of functions that are implemented as programs, particularly in high dimensional 
settings. Optimization and Monte-Carlo integration algorithms, such as gradient 
descent and Hamiltonian Monte-Carlo methods, rely crucially on the calculation 
of derivatives. These algorithms are used in virtually every machine learning and 
computational statistics application, and the calculation of derivatives is usually 
the computational bottleneck. These applications explain the recent surge of in- 
terest in AD, which has resulted in the proliferation of popular AD systems such 
as TensorFlow [1], PyTorch [30], and Stan Math [9]. 

AD, roughly speaking, comes in two modes: forward-mode and reverse-mode. 
When differentiating a function R” — R™, forward-mode tends to be more ef- 
ficient if m >> n, while reverse-mode generally is more performant if n > m. 
As most applications reduce to optimization or Monte-Carlo integration of an 
objective function R” — R with n very large (today, in the order of 10+ — 107), 
reverse-mode AD is in many ways the more interesting algorithm. 

However, reverse AD is also more complicated to understand and implement 
than forward AD. Forward AD can be implemented as a structure-preserving 
program transformation, even on languages with complex features [32]. As such, 
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it admits an elegant proof of correctness [20]. By contrast, reverse-AD is only 
well-understood as a source-code transformation (also called define-then-run 
style AD) on limited programming languages. Typically, its implementations 
on more expressive languages that have features such as higher-order functions 
make use of define-by-run approaches. These approaches first build a computa- 
tion graph during runtime, effectively evaluating the program until a straight-line 
first-order program is left, and then they evaluate this new program [30,9]. Such 
approaches have the severe downside that the differentiated code cannot bene- 
fit from existing optimizing compiler architectures. As such, these AD libraries 
need to be implemented using carefully, manually optimized code, that for exam- 
ple does not contain any common subexpressions. This implementation process 
is precarious and labour intensive. Further, some whole-program optimizations 
that a compiler would detect go entirely unused in such systems. 

Similarly, correctness proofs of reverse AD have taken a define-by-run ap- 
proach and have relied on non-standard operational semantics, using forms of 
symbolic execution [2,28,8]. Most work that treats reverse-AD as a source-code 
transformation does so by making use of complex transformations which intro- 
duce mutable state and/or non-local control flow [31,38]. As a result, we are not 
sure whether and why such techniques are correct. Another approach has been to 
compile high-level languages to a low-level imperative representation first, and 
then to perform AD at that level [22], using mutation and jumps. This approach 
has the downside that we might lose important opportunities for compiler opti- 
mizations, such as map-fusion and embarrassingly parallel maps, which we can 
exploit if we perform define-then-run AD on a high-level representation. 

A notable exception to these define-by-run and non-functional approaches to 
AD is [16], which presents an elegant, purely functional, define-then-run version 
of reverse AD. Unfortunately, their techniques are limited to first-order programs 
over tuples of real numbers. This paper extends the work of [16] to apply to 
higher-order programs over (primitive) arrays of reals: 


— It defines purely functional define-then-run reverse-mode AD on a higher- 
order language. 

— It shows how the resulting, mysterious looking program transformation arises 
from a universal property if we phrase the problem in a suitable categori- 
cal language. Consequently, the transformations automatically respect equa- 
tional reasoning principles. 

— It explains, from this categorical setting, precisely in what sense reverse AD 
is the “mirror image” of forward AD. 

— It presents an elegant proof of semantic correctness of the AD transforma- 
tions, based on a semantic logical relations argument, demonstrating that 
the transformations calculate the derivatives of the program in the usual 
mathematical sense. 

— It shows that the AD definitions and correctness proof are extensible to 
higher-order primitives such as a map-operation over our primitive arrays. 

— It discusses how our techniques are readily implementable in standard func- 
tional languages to give purely functional, principled, semantically correct, 
define-then-run reverse-mode AD. 
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2 Key Ideas 


Consider a simple programming language. Types are statically sized arrays real” 
for some n, and programs are obtained from a collection of (unary) primitive 
operations x : real” H op(a) : real” (intended to implement differentiable 
functions like linear algebra operations and sigmoid functions) by sequencing. 
We can implement both forward mode B and reverse mode AD D on this lan- 
guage as source-code translations to the larger language of a simply typed à- 
calculus over the ground types real” that includes at least the same opera- 
tions. Forward (resp. reverse) AD translates a type 7 to a pair of types B(r) = 
(B(r)1, B(r)2) (resp. D (T) = (D(r)1, D (T)2)) — the first component for holding 
function values, also called primals in the AD literature; the second component 
for holding derivative values, also called tangents (resp. adjoints or cotangents): 


B(real”) = D(real”) = (real”, real”). 
We translate terms x : 7 H t : o to pairs of terms D(t) = (D (t)1, D (t)2) for 
forward AD and D(t) = (D (t)1, D(t)2) for reverse AD, which have types 


x:BD(rTh F Bt): Doh and z:D(T)h F D(t): D(o) 
x: D(r),F D(t): B(t)e > D(0)2 x: D(T) F D(t): D(o)2 > D (T)2. 


B(t), and D (t)ı perform the primal computations for the program t, while B (t)z 
and D (t)2 compute the derivatives, resp., for forward and reverse AD. 
Indeed, we define, by induction on the syntax: 
ef ef ef ef 
B(x) = B(x) = (wry) BOH) = opt) Bopti = op(D(t)r) 


B(op(t))2 = Ay.(Dop)(B(t)1) (B(t)2y)  Dlop(t))2 = Ay-B(t)2 (Dop (D(t)1) y), 


where we assume that we have chosen suitable terms x : real” H (Dop)(z) : 
real” — real” and x : real” + (Dop) (x) : real” — real” to represent the 
(multivariate) derivative and transposed (multivariate) derivative, respectively, 
of the primitive operation op : real” — real”. 

For example, in case of multiplication x : real” + op(x) = (*)(x) : real, we 
can choose D(*)(z) = Ay : real?.swap(z) + y and (D(*)) (x) = Ay : real.y - 
swap(z), where swap is a unary operation on real’ that swaps both compo- 
nents, (°) is a binary inner product operation on real? and (-) is a binary scalar 
product operation for rescaling a vector in real? by a real number . 

To illustrate the difference between DB and D, consider the program t = 
op, (op, (x)) performing two operations in sequence. Then, B (t); = ops(op; (£)) = 
D(t), and (after B-reducing, for legibility) 


B (t)2 = Ay.(Dop2) (op, (x))((Dop,)(x)(y)) 
— 
D (t)2 = ày.(Dop:)' (x)((Dop2) (op; (2))(y)). 
In general, B computes the derivative of a program that is a composition of op- 
erations op,,...,0p,, as the composition (Dop,),...,(Dop,,) of the (multivari- 


ate) derivatives, in the same order as the original computation. By constrast, 
computes the transposed derivative of such a composition of op,,...,0p,, as 
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the composition of the transposed derivatives (Dop,,)’, Bhai (Dop,)’. Observe the 
reversed order compared to the original composition! 
While this AD technique works on the limited first-order language we de- 
scribed, it is far from satisfying. Notably, it has the following two shortcomings: 
1. it does not tell us how to perform AD on programs that involve tuples or 
operations of multiple arguments; 
2. it does not tell us how to perform AD on higher-order programs, that is, 
programs involving \-abstractions and applications. 
The key contributions of this paper are its extension of this transformation (see 
$7) to apply to a full simply typed A-calculus (of §3), and its proof that this 
transformation is correct (see §8). 
Shortcoming 1 seems easy to address, at first sight. Indeed, as the (co)tangent 
vectors to a product of spaces are simply tuples of (co) tangent vectors, one would 


expect to define, for a product type Txo, 


B(r«o) & (B(r)*B(o)1, B(T)2*B(o)2)  D(rxo) X (D(7)1*D(o)1, D(7)2*D (0)2). 


Indeed, this technique straightforwardly applies to forward mode AD: 


B((t, s)) = (B E, B(s)1), Ay-(B(t)2(y), B(s)a(y))) 


B (fst t) “ (fst B (t), A\y.fst B(t)o(y))  B(sndt) Æ (snd B (t), Ay.snd B (t)2(y)). 


For reverse mode AD, however, tuples already present challenges. Indeed, we 
would like to use the definitions below, but they require terms + 0 : 7 and 
t+s:7 for any two t,s:7 for each type T: 


D((t, s)) & UD t), D(s)1), Ay-D (t)2 (fst y) 


D (fst t) “ (fst D (t)1, Ay.(D(t)2(y), 0)) 


+D (s) (snd y)) 

D(snd t) — (snd B(t)1, Ay-(0, D #)2(u)))- 
These formulae capture the well-known issue of fanout translating to addition 
in reverse AD, caused by the contravariance of its second component [31]. Such 
0 and + could indeed be defined by induction on the structure of types, using 
0 and + at real”. However, more problematically, (—,—), fst — and snd — rep- 
resent explicit uses of structural rules of contraction and weakening at types 7, 
which, in a A-calculus, can also be used implicitly in the typing context I’. Thus, 
we should also make these implicit uses explicit to account for their presence in 
the code. Then, we can appropriately translate them into their “mirror image”: 
we map the contraction-weakening comonoids to the monoid structures (+, 0). 


Insight 1. In functional define-then-run reverse AD, we need to make use of 
explicit structural rules and ”mirror them”, which we can do by first translat- 
ing our language into combinators. This translation allows us to avoid the usual 
practice (e.g. [88]) of accumulating adjoints at run-time with mutable state: in- 
stead, we detect all adjoints to accumulate at compile-time. 


Put differently: we define AD on the syntactic category Syn with types 7 as ob- 
jects and (a) 3n-equivalence classes of programs x: T F t : o as morphisms T > øv. 

Yet the question remains: why should this translation for tuples be correct? 
What is even less clear is how to address shortcoming 2. What should the spaces 
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of tangents B(T > o)2 and adjoints D (T — c) look like? This is not something 
we are taught in Calculus 1.01. Instead, we again employ category theory: 


Insight 2. Follow where the categorical structure of the syntax leads you, as 
doing so produces principled definitions that are easy to prove correct. 


With the aim of categorical compositionality in mind, we note that our trans- 
lations compose according to a sort of “syntactic chain-rule”, which says that 


BU) E BOPO, yB OP /,](B(s)a(y))) 


( 
s qn def Dis 5 

DEA = POP O], AyD (8) (a(y)[? O). 
By the following trick, these equations are functoriality laws. Given a Cartesian 
closed category (C, 1, x, =), define categories 5 |C] and %5 [C] as having objects 
pairs (A1, A2) of objects A,, Az of C and morphisms 

B[C]((A1, A2), (Bi, B2)) = C(A1, B1) x C(Ai, A2 = Ba) 

B[C|((A1, A2), (Bı, Ba) © C(A, Bi) x C(A1, B> = A). 
Both have identities id(4,, As) $ a f (id 4,, A(T2)), where we write n for categori- 
cal currying and 7 for the second projection. Composition in 3 |C] and $ [C], 


respectively, of (A1, A2) Tira, (Bi, By) ——> Mie (C1, C2) are 


(kı, kə); (L, Is) ky: l, àa : Ay.Aag $ Av.lo(kı(a1))(k2(a1, a2))) 
(ki, kə); (l 1,¢ 2) l2) (ki; l, Àa : A1. ÀAC2 a C2.ko2(a1)(l2(kı (a1), c2))), 


where we work in the internal language of C. Then, we have defined two functors: 
B : Syn, > Z [Syn] D : Syn, > D [Syn], 


where we write Syn, for the syntactic category of our restrictive first-order 
language, and we write Syn for that of the full A-calculus. We would like to 
extend these to functors 


Syn > 3 [Syn] Syn > D [Syn]. 


3 [C] turns out to be a category with finite products, given by (A1, A2)x (B1, Bo) = 
(A, x Bı, A x B2). Thus, we can easily extend BD to apply to an extension of 
Syn, with tuples by extending the functor in the unique structure-preserving 
way. However, 5 [Syn] does not have products and neither 3 [Syn] nor 5 [Syn] 
supports function types. (The reason turns out to be that not all functions are 
linear in the sense of respecting 0 and +.) Therefore, the categorical structure 
does not give us guidance on how to extend our translation to all of Syn. 


Insight 3. Linear types can help. By using a more fine-grained type system, we 
can capture the linearity of the derivative. As a result, we can phrase AD on our 
full language simply as the unique structure-preserving functor that extends the 
uncontroversial definitions given so far. 


To implement this insight, we extend our A-calculus to a language LSyn with 
limited linear types (in §4): linear function types — and a kind of multiplicative 
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conjunction !(—) @ (—), in the sense of the enriched effect calculus [14]. The 
algebraic effect giving rise to these linear types, in this instance, is that of the 
theory of commutative monoids. As we have seen, such monoids are intimately 
related to reverse AD. Consequently, we demand that every f with a linear 
function type T — ø is indeed linear, in the sense that f0 = 0 and f (t+ s) = 
(ft) + (f s). For the categorically inclined reader: that is, we enrich LSyn over 
the category of commutative monoids. 

Now, we can give more precise types to our derivatives, as we know they are 
linear functions: for x : T F t: ø, we have x : B(r); + B (t)z : B(r)2 — D (0)2 
and « : P(T) E D(t)2 : D(o)2 —© D(r)g. Therefore, given any model £ of 
our linear type theory, we generalise our previous construction of the categories 
B(L] and S [L], but now we work with linear functions in the second component. 
Unlike before, both 3[L] and 5 [£] are now Cartesian closed (by §6)! 

Thus, we find the following corollary, by the universal property of Syn. This 
property states that any well-typed choice of interpretations F (op) of the prim- 
itive operations in a Cartesian closed category C extends to a unique Cartesian 
closed functor F : Syn > C. It gives a principled definition of AD and explains 
in what sense reverse AD is the “mirror image” of forward AD. 


Corollary (Definition of AD, 87). Once we fix the interpretation of the primi- 
tives operations op to their respective derivatives and transposed derivatives, we 


obtain unique structure-preserving forward and reverse AD functors B : Syn > 
3[LSyn] and Ð : Syn > D[LSyn]. 


In particular, the following definitions are forced on us by the theory: 


Insight 4. For reverse AD, an adjoint at function type T > o, needs to keep 
track of the incoming adjoints v of type D(o)2 for each a primal x of type 
D(r)1 on which we call the function. We store these pairs (x,v) in the type 
'D(r)1 @ D(a)2 (which we will see is essentially a quotient of a list of pairs of 
type D(r)1*D(o)2). Less surprisingly, for forward AD, a tangent at function 
type T —> o consists of a function sending each argument primal of type B(T) 
to the outgoing tangent of type B(c)2. 


(r)2 — B(o)2)), B(r)1 + B(o)z) 


(7)2 — D(r)2)),!D(7)1 @ D(o)a) 


B(T > 0) © (B(T) > (Blo) #( 


Dir > o) & (5(r) > (D(a) 1 #( 


With these definitions in place, we turn to the correctness of the source-code 
transformations. To phrase correctness, we first need to construct a suitable de- 
notational semantics with an uncontroversial notion of semantic differentiation. 
A technical challenge arises, as the usual calculus setting of Euclidean spaces 
(or manifolds) and smooth functions cannot interpret higher-order functions. 
To solve this problem, we work with a conservative extension of this standard 
calculus setting (see §5): the category Diff of diffeological spaces. We model 
our types as diffeological spaces, and programs as smooth functions. By keeping 
track of a commutative monoid structure on these spaces, we are also able to 
interpret the required linear types. We write Diffcm for this “linear” category 
of commutative diffeological monoids and smooth monoid homomorphisms. 
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By the universal properties of the syntax, we obtain canonical, structure- 
preserving functors [—] : LSyn > Diffcm and [-] : Syn > Diff once we fix 
interpretations R” of real” and well-typed interpretations [op] for each operation 
op. These functors define a semantics for our language. 

Having constructed the semantics, we can turn to the correctness proof (of 
88). Because calculus does not provide an unambiguous notion of derivative at 
function spaces, we cannot prove that the AD transformations correctly imple- 
ment mathematical derivatives by plain induction on the syntax. Instead, we use 
a logical relations argument over the semantics, which we phrase categorically: 


Insight 5. Once we show that the derivatives of primitive operations op are 
correctly implemented, correctness of derivatives of other programs follows from 
a standard logical relations construction over the semantics that relates a curve to 
its (co)tangent curve. By the chain-rule, all programs respect the logical relations. 


To show correctness of forward AD, we construct a category SScone whose 
objects are triples ((X, (Y1, Y2)), P) of an object X of Diff, an object (Y1, Y2) 
of 2 [Diffcm] and a predicate P on Diff (R, X) x 3 [Diffom]((R,R), (Y1, Y2)). 


It has morphisms ((X, (Y1, Y2)), P) £E, ((x’,(¥/,¥2)), P’), which are a 


pair of morphisms X 4, X' and (Y1, Y2) a (Y1, Y3) such that for any 


(y, (61, 62)) E€ P, we have that (y; f, (61,62); (g, h)) € P’. SScone is a standard 
category of logical relations, or subscone, and it is widely known to inherit the 
Cartesian closure of Diff x 3 [Diff] (see §§8.1). It also comes equipped with 
a Cartesian closed functor SScone > Diff x 3 [Diffcm]. Therefore, once we fix 
predicates Pe on ([-], 3 [[-]]) (real”) and show that all operations op respect 
these predicates, it follows that our denotational semantics lifts to give a unique 


. (f ——> ; 
structure-preserving functor Syn ——> SScone, such that the left diagram below 


commutes (by the universal property of Syn). 


id,D id,D 
Syn see Syn x 3(LSyn] Syn eet Syn x 9 (LSyn] 
| [oam | \raxsu-n 
SScone ——> Diff x 3 [Diffcom] SScone —> Diff x b [Diffom] 


Consequently, we can work with PE ok = {(f,(g,h))| g=f and h=Df}, 


where we write Df(x)(v) for the multivariate calculus derivative of f at a point 
x evaluated at a tangent vector v. By an application of the chain rule for differ- 
entiation, we see that every op respects this predicate, as long as | Dop] = D]op]. 
The commuting of our diagram then virtually establishes the correctness of 
forward AD. The only remaining step in the argument is to note that any 
tangent vector at |r] =% R”, for first-order 7, can be represented by a curve 


R — [r]. For reverse AD, the same construction works, if [Dop] = D[op]’, 
by replacing 3[—] with S[—] and DB with D. We can then choose Pr pyn = 


{(f, (g,h)) | g = f and h = £ > (Df(x))'}, as the predicates for constructing 


(real”))", where we write A’ for the matrix transpose of A. We obtain our main 
theorem, which crucially holds even for t that involve higher-order subprograms. 
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Theorem (Correctness of AD, Thm. 1). For any typed term a:7' t: 0 in 
Syn between first-order types T,0, we have that 


[DB (t)2)(e) = Dite) and [D(t)2] (x) = Dia)". 


Next, we address the practicality of our method (in §9). The code transfor- 
mations we employ are not too daunting to implement. It is well-known how to 
mechanically translate \-calculus and functional languages into a (categorical) 
combinatory form [12]. However, the implementation of the required linear types 
presents a challenge. Indeed, types like !(—) & (—) and (—) — (—) are absent 
from languages such as Haskell and O’Caml. Luckily, in this instance, we can 
implement them using abstract data types by using a (basic) module system: 


Insight 6. Under the hood, !T ®o can consist of a list of values of type Txo. 
Its API ensures that the list order and the difference between xs + [(t, s), (t, s’)] 
+4 ys and xs + [(t,s + s’)] ++ ys cannot be observed: as such, it is a quotient 
type. Meanwhile, T — o can be implemented as a standard function type T > 0 
with a limited API that enforces that we can only ever construct linear functions: 
as such, it is a subtype. 


We phrase the correctness proof of the AD transformations in elementary 
terms, such that it holds in the applied setting where we use abstract types to 
implement linear types. We show that our correctness results are meaningful, as 
they make use of a denotational semantics that is adequate with respect to the 
standard operational semantics. Finally, to stress the applicability of our method, 
we show that it extends to higher-order (primitive) operations, such as map. 


3 A-Calculus as a Source Language for AD 


As a source language for our AD translations, we can begin with a standard, 
simply typed A-calculus which has ground types real” of statically sized arrays 
of n real numbers, for all n € N, and sets Oph, ,, of primitive operations 
op for all k, m, n1,... Nng E IN. These operations will be interpreted as smooth 
functions (R®! x ... x R”*) + R™. Examples to keep in mind for op include 


— constants c € Op” for each c € R”, for which we slightly abuse notation and 
write c(()) as c; 

— elementwise addition and product (+), (*) €Op;,,, and matrix-vector prod- 
uct (x) € OP hmm: 

— operations for summing all the elements in an array: sum € Opł; 

— some non-linear functions like the sigmoid function ¢ € Op}. 


We intentionally present operations in a schematic way, as primitive operations 
tend to form a collection that is added to in a by-need fashion, as an AD library 
develops. The precise operations needed will depend on the applications, but, 
in statistics and machine learning applications, Op tends to include a mix of 
multi-dimensional linear algebra operations and mostly one-dimensional non- 
linear functions. A typical library for use in machine learning would work with 
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multi-dimensional arrays (sometimes called “tensors”). We focus here on one- 
dimensional arrays as the issues of how precisely to represent the arrays are 
orthogonal to the concerns of our development. 


The types T, ø, p and terms t,s,r of our AD source language are as follows: 


T,0,P 


= types TL *T2 binary product 
real” real arrays TOO function 
1 nullary product 

= terms fstt |sndt product projections 
x variable Ax.t function abstraction 
op(t) operations ts function application 


() (ts) 


product tuples 


The typing rules are in Fig. 1, where we write Dom(op) df real”! x... xreal”* 
for an operation op € Oph». We employ the usual syntactic sugar let x = 


tins & (Azx.s)t and write real for real’. As Fig. 2 displays, we consider the 
terms of our language up to the standard 67-theory. We could consider further 
equations for our operations, but we do not as we will not need them. 

This standard A-calculus is widely known to be equivalent to the free Carte- 
sian closed category Syn generated by the objects real” and the morphisms op. 
Syn effectively represents programs as (categorical) combinators, also known as 
“point-free style” in the functional programming community. Indeed, there are 
well-studied mechanical translations from the A-calculus to the free Cartesian 
closed category (and back) [26,13]. The translation from Syn to A-calculus is 
self-evident, while the translation in the opposite direction is straightforward 
after we first convert our \-terms to de Bruijn indexed form. Concretely, 


— Syn has types T, ø, p objects; 
— Syn has morphisms t € Syn(7,c) which are in 1-1 correspendence with 


terms <z : 


T HF t: o up to Br-equivalence (which includes a-equivalence); 


explicitly, they can be represented by 


identities: id, E€ Syn(7,7) (cf., variables up to a-equivalence); 
composition: t;s E€ Syn(r, p) for any t E€ Syn(7,c) and s € Syn(o, p) 
(corresponding to the capture avoiding substitution s[‘/,] if we represent 
a:tekt:oandy:of s: p); 

terminal morphisms: (), € Syn(r, 1); 

product pairing: (t,s) € Syn(7,o*p) for any t E€ Syn(7,c) and s € 
Syn(r, p); 

product projections: fst +o E€ Syn(r*o,7) and snd +o € Syn(7T*o, 0); 


((x:r)E€r) LF t: Dom(op) (op E€ Opp, A ie) TkKt:7 TkKs:o0 
Tra: I F op(t) : real” r- (:1 IF (t, s): Teo 
PEt: to PEt: to Tyx:tetioa FrEt:077 PR so 
TIF fstt:7 I[tsndt:o TrArt:tT7 0 PREF 


Fig. 1. Typing rules for the AD source language. 
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t= () fst (t,s) =t snd(t,s)=s t= (fstt,sndt) (Av.t)s=t[*/] t © Arte 


Fig. 2. Standard $7-laws for products and functions. We write #rrxn to indicate 


that the variables 71,...,2%n need to be fresh in the left hand side. Equations hold on 
pairs of terms of the same type. As usual, we only distinguish terms up to a-renaming 
of bound variables. 


e function evaluation: ev, , E€ Syn((7 > o)*T, 0); 

e currying: A,,,,(t) E Syn(r, 0 — p) for any t E€ Syn(rx<, p); 

e operations: op E€ Syn(real”'*..*real”*, real’) for any op € Oph on, - 
— all subject to the usual equations of a Cartesian closed category [26]. 


1 and * give finite products in Syn, while — gives categorical exponentials. 

Syn has the following universal property: for any Cartesian closed category 
(C,1, x, =), we obtain a unique Cartesian closed functor F : Syn > C, once we 
choose objects Freal” of C as well as, for each op € Op}; ,,, make well-typed 
choices of C-morphisms Fop: (Freal™ x... x Freal”*) > Freal”. 


4 Linear A-Calculus as an Idealised AD Target Language 


As a target language for our AD source code transformations, we consider a 
language that extends the language of §3 with limited linear types. We could 
opt to work with a full linear logic as in [6] or [4]. Instead, however, we will only 
include the bare minimum of linear type formers that we actually need to phrase 
the AD transformations. The resulting language is closely related to, but more 
minimal than, the Enriched Effect Calculus of [14]. We limit our language in this 
way because we want to stress that the resulting code transformations can easily 
be implemented in existing functional languages such as Haskell or O’Caml. As 
we discuss in §9, the idea will be to make use of a module system to implement 
the required linear types as abstract data types. 

In our idealised target language, we consider linear types (aka computation 
types) 7, g, p, in addition to the Cartesian types (aka value types) T, o, p that 
we have considered so far. We think of Cartesian types as denoting spaces and 
linear types as denoting spaces equipped with an algebraic structure. As we are 
interested in studying differentiation, the relevant space structure in this instance 
is a geometric structure that suffices to define differentiability. Meanwhile, the 
relevant algebraic structure on linear types turns out to be that of a commutative 
monoid, as this algebraic structure is needed to phrase automatic differentiation 
algorithms. Indeed, we will use the linear types to denote spaces of (co)tangent 
vectors to the spaces of primals denoted by Cartesian types. These spaces of 
(co)tangents form a commutative monoid under addition. 

Concretely, we extend the types and terms of our language as follows: 


T,0,P == linear types | TKO binary product 
| real” real array | roo function 
| 1 unit type | T80 tensor product 
T,0,p == Cartesian types | t—-.3¢ linear function 


[oaea as in §3 
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I t:Dom(lop) T;x:zH s:LDom(lop) (lop € LOpy P nt) 
Tye:tThae:t T;x2:7F lop(t; s) : real” 
Tye:ctht:o I;x:rHÃHs:ip Tjæ:TFt: oxp Dyx:tbht:oxp 
Deere cae Oa Tyner (t,s) o atp T;x:rFfstt:o T;s:zrFsndt:p 


Tyy:o;u:tb tip Tye:tht:o+p T 


Rei (DP Rtwer Peper p 


Fs: 


I;x:TH àyt:0o >p T;x:rFts:p 


I;x:THt!ogp T,y:0;z:pF s 


Tx: tFlt@s:lo@p 


:p Peer r tie 


T;x:tK casetof!ly@z—>s:p' 


Trt:p~oa Tjsx:th sip 


Th Ant: Too 


Titit tio Learners 


Tja:rht{s}:a T;2 


TF Oo 


Tigor Pt suo 


Fig. 3. Typing rules for the idealised AD target language with linear types. 


t, 8,7 ::= terms 
| dees asin §3 | Aa.t | t{s} 

| lop(t;s) linear op. | 0 |t+s 

We work with linear operations lop € LOpy", 


| 8s | casetof!y@®z— s 


at 
9K 5M] yee) 


tensor product 

abstraction/appl. 
monoid structure. 
nj» Which are intended to 


represent functions which are linear (in the sense of respecting 0 and +) in the 
last | arguments but not in the first k. We write Dom/(lop) If real”! x... xreal”* 


and LDom(lop) df real”t*... real”! for lop € LORS ac meinadcent These oper- 
ations can include e.g. dense and sparse matrix-vector multiplications. Their pur- 
pose is to serve as primitives to implement derivatives Dop(a; y) and (Dop) (x; y) 
of the operations op from the source language as terms that are linear in y. 

In addition to the judgement I’ F t: 7, which we encountered in 83, we now 
consider an additional judgement I’; x : TF t: a. While we think of the former as 
denoting a (structure-preserving) function between spaces, we think of the latter 
as a (structure-preserving) function from the space which I denotes to the space 
of (structure-preserving) monoid homomorphisms from the denotation of 7 to 
that of ø. In this instance, “structure-preserving” will mean differentiable. 

Fig. 3 displays the typing rules of our language. We consider the terms of 
this language up to the $n+-equational theory of Fig. 4. It includes 67-rules as 
well as commutative monoid and homomorphism laws. 


case !t Q sof !2 Qy > r = riz, /y] ta #2” case sof ly @ z > t[¥?*/,] 
(Av-t){s} = t[%/e] t © arta} 

t+0=t 0+t=t (t+s)+r=t+(s+r) t+s=s+t 
(Tix: THt: g)> the] =0 (Dix: TE t: a) = tE] = tle] + tl] 


Fig. 4. Equational rules for the idealised, linear AD language, which we use on top of 
the rules of Fig. 2. In addition to standard $7-rules for !(—) @ (—)- and —-types, we 
add rules making (0,+) into a commutative monoid on the terms of each linear type 
as well as rules which say that terms of linear types are homomorphisms in their linear 
variable. Equations hold on pairs of terms of the same type. 
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5 Semantics of the Source and Target Languages 


5.1 Preliminaries 


Category theory We assume familiarity with categories, functors, natural 
transformations, and their theory of (co)limits and adjunctions. We write: 


— unary, binary, and I-ary products as 1, Xı x Xə, and Į [iez Xi, writing 7; 
for the projections and (), (#1, £2), and (#;),<; for the tupling maps; 

— unary, binary, and J-ary coproducts as 0, X; + X2, and Jez Xi, writing u 
for the injections and |], [x1, x2], and [x;],-, for the cotupling maps; 

— exponentials as Y > X, writing A and ev for currying and evaluation. 


Monoids We assume familiarity with the category CMon of commutative 
monoids X = (|X|,0x,+x), such as R” ef (R”,0,+), their cartesian product 
X xY, tensor product X ®Y, and the free monoid !S on a set S (write 6 for the 
inclusion S > |!S'|). We will sometimes write X; _} x; for ((v1+a@2)+...)...4+2n. 

Recall that a category C is called CMon-enriched if we have a commuta- 
tive monoid structure on each homset C(C, C”) and function composition gives 
monoid homomorphisms C(C,C’) ® C(C’, C”) => C(C, C”). Finite products in a 


category C are well-known to be biproducts (i.e. simultaneously products and 


coproducts) if and only if C is CMon-enriched (see e.g. [17]): define |] "0 and 


(f, g] © m1; f + 72:9 and, conversely, 0 = [] and f +g © (id, id); [f, g]. 


5.2 Abstract Semantics 


The language of §3 has a canonical interpretation in any Cartesian closed cat- 
egory (C,1,x,=> ), once we fix C-objects [real”] to interpret real” and C- 
morphisms [op] € C([Dom(op)], [real’’]) to interpret op € Opri... n, We inter- 


pret types 7 and contexts I" as C-objects [7] and [I]: [£1 : 71,---,2n : Tl] = 


In] x...xfp-l čr Peo] SE] xf] Pool? lesb. 
We interpret terms I F t: 7 as morphisms |t] in C(I], [7]): 

[æi : Ties En i Tn F te: Te] E Tg I= 0 Lt, sy) $ (4. Isl) 
[fst] “<7, [snd] & m Axt] Œ AD [ts] © (fel, Is); ev. 
This is an instance of the universal property of Syn mentioned in 83. 

We discuss how to extend [—] to apply to the full target language of §4. 
Suppose that £ : C°? — Cat is a locally indexed category (see e.g. [27, §§89.3.4]), 
i.e. a (strict) contravariant functor from C to the category Cat of categories, such 
that ob £(C) = ob £L(C’) and L(f)(L) = L for any object L of ob £(C) and any 
f:C’ >C in C. We say that £ is biadditive if each category £(C) has (chosen) 
finite biproducts (1, x) and £(f) preserves them, for any f : C’ > C in C, in the 
sense that £(f)(1) = 1 and L(f)(L x L’) = L(f)(L) x L(f)(L’). We say that it 
supports !(—) @(—)-types and =-types, if £(71) has a left adjoint !C’ ®c— anda 
right adjoint functor C” =ç —, for each product projection mı :C x C” > C in 
C, satisfying a Beck-Chevalley condition: !C’ @¢ L =!C" @cuv L and C' >ç L = 
C! =o L for any C, C” € obC. We simply write !C’@L and C” => L. Let us write 


ray 
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® and W for the natural isomorphisms £(C)(!C’ 8 L, L’) = L£(C x C’)(L, L’) 
and L(C x C)(L, L’) a L£(C)(L,C’ = L’). We say that L supports Cartesian 
—o-types if the functor C°? > Set; C + L(C)(L,L’) is representable for any 
objects L, L’ of £L. That is, we have objects L —o L’ of C with isomorphisms 
A: L(C)(L, L’) =} C(C, L — L’), natural in C. We call an £ satisfying all these 
conditions a categorical model of the language of §4. In particular, any biadditive 
model of intuitionistic linear logic [29,17] is such a categorical model. 

If we choose [real”] € ob £ to interpret real” and compatible £-morphisms 
[lop] in £([Dom(lop)])({LDom(lop)], [real*]) for each DOP enameled 
we can interpret linear types 7 as objects |r] of £: 


def def 


[2] = ff ie] > fo] [Ir @o] 7] @ iei: 
def 


We can interpret T — g as the C-object [zt — o] = [z] — [eo]. Finally, 
we can interpret terms I’ + t : T as morphisms [t] in C([JI],[7]) and terms 
T;x:THt: ø as [t] in e(r DE, Lal): 


1 [reo] © ft] x lel [rg] 


[Dr zH e:r] © idj [ 


( 
def def ( 


I= 0 es) (is) [fst] m [snd] 7 

[vt] = Y(t) [es] = La, [s])) (4 E) 

lte s] © L((id, (EG); Cele [s]) [case tof !y @ x — s] & [#];O-([s]) 
def def def 


3 
Dot] = AC) IHH = AC): fs] 1) = fe + sl = Gd, id); el, Esl]. 


Observe that we interpret 0 and + using the biproduct structure of £. 


def def 


Proposition 1. The interpretation |—] of the language of §4 in categorical mod- 


els is both sound and complete with respect to the By+-equational theory: pats 
iff [t] = [s] in each such model. 


Soundness follows by case analysis on the (7+-rules. Completeness follows 
by the construction of the syntactic model LSyn : CSyn” > Cat: 


— CSyn extends its full subcategory Syn with Cartesian —o-types; 

— Objects of LSyn(r) are linear types ø of our target language. 

— Morphisms in LSyn(r)(¢,e) are terms z : 7;y: a t: p modulo (a)6n+- 
equivalence. i ~ 

— Identities in LSyn(r7) are represented by the terms x: T;y:aF y:a. 

— Composition of x : T; Yı : C4 F t: a5 and T : T; Y2 : 0) t: a5 in LSyn(r) is 
defined by the capture avoiding substitution x : T; yı : a F 8['/y.| : £3. 

— Change of base LSyn(t) : LSyn(T) > LSyn(r’) along (x! : 7’ Ft: 7) € 
CSyn(7’,7) is defined LSyn(t)(a:7;y: a1 s: p) ae: Tiy: oF sfz] : p- 

— All type formers are interpreted as one expects based on their notation, using 
introduction and elimination rules for the required structural isomorphisms. 


5.3 Concrete Semantics 


Diffeological Spaces Throughout this paper, we have an instance of the ab- 
stract semantics of our languages in mind, as we intend to interpret real” as 
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the usual Euclidean space R” and to interpret each program x; : real”',..., £p: 
real”* | ¢: real” as a smooth (C™-) function R”! x... x R” > R™. A chal- 
lenge is that the usual settings for multivariate calculus and differential geometry 
do not form Cartesian closed categories, obstructing the interpretation of higher 
types (see [20, Appx. A]). A solution, recently employed by [20], is to work with 
diffeological spaces [33,21], which generalise the usual notions of differentiability 
from Euclidean spaces and smooth manifolds to apply to higher types (as well 
as a range of other types such a sum and inductive types). We will also follow 
this route and use such spaces to construct our concrete semantics. Other valid 
options for a concrete semantics exist: convenient vector spaces [19,7], Frélicher 
spaces [18], or synthetic differential geometry [25], to name a few. We choose to 
work with diffeological spaces mostly because they seem to us to provide simplest 
way to define and analyse the semantics of a rich class of language features. 

Diffeological spaces formalise the intuition that a higher-order function is 
smooth if it sends smooth functions to smooth functions, meaning that we can 
never use it to build non-smooth first-order functions. This intuition is reminis- 
cent of a logical relation, and it is realised by directly axiomatising smooth maps 
into the space, rather than treating smoothness as a derived property. 


Definition 1. A diffeological space X = (|X|, Px) consists of a set |X| together 
with, for each n € N and each open subset U of R”, a set PY of functions 
U — |X| called plots, such that 
— (constant) all constant functions are plots; 
— (rearrangement) if f : V — U is smooth and p € PX, then f;p € PX; 
— (gluing) if (pi € PX) ‘ is a compatible family of plots (x € U; N Uj => 
iE 
pilx) = pj(x)) and (Ui)icr covers U, then the gluing p : U > |X|: x € U; > 
pilx) is a plot. 
We think of plots as the maps that are axiomatically deemed “smooth”. We call 
a function f : X — Y between diffeological spaces smooth if, for all plots p € PX, 
we have that p; f € PY. We write Diff (X,Y) for the set of smooth maps from X 
to Y. Smooth functions compose, and so we have a category Diff of diffeological 
spaces and smooth functions. We give some examples of such spaces. 


Example 1 (Manifold diffeology). Given any open subset X of a Euclidean space 
R” (or, more generally, a smooth manifold X), we can take the set of smooth 
(C) functions U — X in the traditional sense as PẸ. Given another such 
space X’, then Diff(X, X’) coincides precisely with the set of smooth functions 
X — X' in the traditional sense of calculus and differential geometry. 


Put differently, the categories CartSp of Euclidean spaces and Man of 
smooth manifolds with smooth functions form full subcategories of Diff. 


Example 2 (Product diffeology). Given diffeological spaces (Xj) ,< 7; 
def 


Ilier |X| with the product diffeology: Pe Xi = {(a:),er |% E PẸ} 


we can equip 


Example 3 (Functional diffeology). Given diffeological spaces X,Y, we can equip 
Diff(X,Y) with the functional diffeology PUx = {A(a) | a € Diff(U x X,Y)}. 
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Examples 2 and 3 give us the categorical product and exponential objects, 
respectively, in Diff. The embeddings of CartSp and Man into Diff preserve 
products (and coproducts). 

We work with the concrete semantics, where we fix C = Diff as the target 
for interpreting Cartesian types and their terms. That is, by choosing the inter- 


pretation [real”] f Rn and by interpreting each op € Oph; n, aS the smooth 


function [op] : R™ x... x R"* — R™ that it is intended to represent, we obtain 
a unique interpretation [—] : CSyn —> Diff. 


Diffeological Monoids To interpret linear types and their terms, we need a 
semantic setting £ that is both compatible with Diff and enriched over the cate- 
gory of commutative monoids. We choose to work with commutative diffeological 
monoids. That is, commutative monoids internal to the category Diff. 


Definition 2. A diffeological monoid X = (|X|,Px,0x,+x) consists of a dif- 
feological space (|X|, Px) with a monoid structure (Ox € |X|, (+x) : |X|x|X|—> 
|X|), such that +x is smooth. We call a diffeological monoid commutative if the 
underlying monoid structure on |X| is commutative. 


We write Diffem for the category whose objects are commutative diffeo- 
logical monoids and whose morphisms (|X|,Px,0x,+x) > (|Y|,Py,0y,+y) 
are functions f : |X| — |Y| that are both smooth (|X|, Px) > (|Y|,Py) and 
monoid homomorphisms (|X|, 0x, +x) > (/Y|,0y,+y). Given that Diffcm is 
CMorn-enriched, finite products are biproducts. 


Example 4. The real numbers R form a commutative diffeological monoid R by 
combining its standard diffeology with its usual commutative monoid structure 
(0, +). Similarly, N € Diffcyy by equipping N with (0,+) and the discrete diffe- 
ology, in which plots are locally constant functions. 


Example 5. We form the (categorical) product in Diffom of (X;),<; by equip- 
ping [],-<, |X| with the product diffeology and product monoid structure. 


Example 6. For a commutative ecole monoid X, we can equip the monoid 
1(|X|,0x,+x) with the diffeology py E = Sf fy 1a; |n EN and a; € PY}. 


Example 7. Given commutative diffeological monoids X and Y, we can equip 
the tensor product monoid (|X|, 0x, +x)8 (|Y |, Oy, +y) with the tensor product 


diffeology: Poy qet DDE 1&8 fi|n EN and a; € PX, Bi € PY }. 


In this paper, we only use the combined operation |X @ Y (read: (!X) @Y). 


Example 8. Given commutative diffeological monoids X and Y, we can define a 

commutative diffeological monoid X — Y with underlying set Diffem(X,Y), 
def def 

Ox—y(x) = oe (f +x—y g)(z) = ie ) +y g(x) and 


def 
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In this paper, we will primarily be interested in X — Y as a diffeological 
space, and we will mostly disregard its monoid structure for now. 


Example 9. Given a diffeological space X and a commutative diffeological monoid 
Y, we can define a commutative diffeological monoid structure X = Y on 


X => (|Y|, Py) by using the pointwise monoid structure: 0xsy(z) X Oy and 
def 
(ftxsy 9)(£) = f(a) +y g(x). 


Given f € Diff(X,Y), we can define !f € Diffom(!X,!Y) by !f(90, x) = 
X; f(x). ! is a left adjoint to the obvious forgetful functor Diffom > Diff, 
while (X x Y) S!X@!Y and !1 =S N. Seeing that (N, &,—) defines a sym- 
metric monoidal closed structure on Diffcm, cognoscenti will recognise that 
(Diff, 1, x, =) S (Diffem, N, 1, x, @,—©) is a model of intuitionistic linear logic 
[29]. In fact, seeing that Diffeyy is CMon-enriched, the model is biadditive [17]. 

However, we do not need such a rich type system. For us, the following 


suffices. Define Diffecm(X), for X € ob Diff, to have the objects of Diffom and 


homsets Diffom(X)(Y, Z) E Dif (X,Y — Z). Identities and composition are 


defined as z +> (y > y) and f;Difom(xX) g is defined by x > (f(£);Difom 9(2))- 
Given f € Diff(X, X’), we define change-of-base Diffem(X’) > Diffem(X) 
as Diffom(f)(g) = finie g. Diffom(—) defines a locally indexed category. By 
taking C = Diff and £(—) = Diffem(—), we obtain a concrete instance of our 
abstract semantics. Indeed, we have natural isomorphisms 


Diffom(X)(1X’ @ Y, Z) & Diffom(X x X’)(Y, Z) 


Diffom(X x X’)(Y, Z) Š Diffom(X)(Y, X’ > Z) 
BENE EAEN) F(A) IG) @ ue) ES F@ 2) wo) 


HANEN Ee — FN (e ey) E OANE). 


The prime motivating examples of morphisms in this category are derivatives. 
Recall that the derivative at x, Df (x), and transposed derivative at x, (D f)(x), 
of a smooth function f : R” + R™ are defined as the unique functions Df (x) : 
R” > R” and (Df) (x) : R™ > R” satisfying 


MESO OAT) (Dp)'(a)(w) +0 = w+ DFE), 

where we write vev’ for the inner product X; —; (miv): (miv') of vectors v, v’ € R”. 
Now, for f € Diff(R",R™”), Df and (Df)* give maps in Diffoy(R”)(R”, R™) 
and Diffom(R”)(R™, R”), respectively. Indeed, derivatives Df(x) of f at x are 
linear functions, as are transposed derivatives (D f)‘ (x). Both depend smoothly 
on x in case f is C°-smooth. Note that the derivatives are not merely linear in 
the sense of preserving 0 and +. They are also multiplicative in the sense that 
(Df)(x)(c-v) =c-(Df)(x)(v). We could have captured this property by working 
with vector spaces internal to Diff. However, we will not need this property to 


Df («x)(v) = lims—o 
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phrase or establish correctness of AD. Therefore, we restrict our attention to the 
more straightforward structure of commutative monoids. 

Defining [real”] a R” and interpreting each lop € LOp as the smooth 
function [lop] : (R" x... x R™) > (R™ x ... x R™) — R™ it is intended to 
represent, we obtain a canonical interpretation of our target language in Diffcoy,. 


6 Pairing Primals with Tangents/Adjoints, Categorically 


In this section, we show that any categorical model £ : C°? — Cat of our target 
language gives rise to two Cartesian closed categories Xc£ and Yel? (which 
we wrote S[L] and DIL] in §2). We believe these observations of Cartesian 
closure are novel. Surprisingly, they are highly relevant for obtaining a principled 
understanding of AD on a higher-order language: the former for forward AD, and 
the latter for reverse AD. Applying these constructions to the syntactic category 
LSyn : CSyn” — Cat of our language, we produce a canonical definition of 
the AD macros, as the canonical interpretation of the -calculus in the Cartesian 
closed categories VcgynLSyn and XYesynLSyn™”. In addition, when we apply 
this construction to the denotational semantics Diffom : Diff”? — Cat and 
invoke a categorical logical relations technique, known as subsconing, we find 
an elegant correctness proof of the source code transformations. The abstract 
construction delineated in this section is in many ways the theoretical crux of 
this paper. 


6.1 Grothendieck Constructions on Strictly Indexed Categories 


Recall that for any strictly indexed category, i.e. a (strict) functor £ : C? + Cat, 
we can consider its total category (or Grothendieck construction) Xe£, which 
is a fibred category over C (see [23, sections A1.1.7, B1.3.1]). We can view it as 
a X-type of categories, which generalizes the Cartesian product. Concretely, its 
objects are pairs (Ai, A2) of objects A; of C and Ag of L(Aı). Its morphisms 
(A1, Az) > (B1, B2) are pairs (fı, f2) of a morphism fı : Ay > Bı in C anda 
morphism fz : Ay > L(f1)(Bz) in £(Aj). Identities are idca, Aa) = (ida,,id,) 


and composition is (f1, f2); (91,92) E (fi; 915 f2; £(f1)(g2)). Further, given a 


strictly indexed category £ : C°? + Cat, we can consider its fibrewise dual cate- 
gory L? : C°P — Cat, which is defined as the composition C°? £, Cat 2's Cat. 
Thus, we can apply the same construction to £°? to obtain a category Xel. 


6.2 Structure of Xcel and Xcel? for Locally Indexed Categories 


§86.1 applies, in particular, to the locally indexed categories of §5. In this case, 
we will analyze the categorical structure of cL and X¢eL°?. For reference, we 
first give a concrete description. 

ScL is the following category: 


— objects are pairs (A1, A2) of objects A, of C and Ag of L£; 
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— morphisms (A, A2) —> (Bı, B2) are pairs (fi, f2) with fı : A, > By EC 
and fe: Ag —} Bə E L(A, ); 


= composition of (Ay, A2) Goh), (Bı, B2) and (Bı, B2) Lono), (C1, C2) is 


given by (f1; 91, f2; £(f1)(g2)) and identities id(4,.4,) are (id4,,ida,). 
XcL? is the following category: 


— objects are pairs (A1, A2) of objects A; of C and Ag of L; 
— morphisms (A1, A2) —> (Bı, B2) are pairs (fi, f2) with fı : A, > By EC 
and fe: By =} Ag G L(A, ); 


— composition of (Aj, A2) KEAN (Bı, B2) and (By, Bo) ie) (C1, C2) is 


given by (f1;91,£(f1)(g2); f2) and identities id(4,/4,) are (id4,,ida,). 


We examine the categorical structure present in YcL and Xel’? for categor- 
ical models £ in the sense of 85 (i.e., in case £ has biproducts and supports >-, 
!(—) 8 (—)-, and Cartesian —o-types). We believe this is a novel observation. We 
will make heavy use of it to define our AD algorithms and to prove them correct. 


Proposition 2. X¢L has terminal object 1 = (1,1), binary product (A, A2) x 
(B1, B2) = (Ay x Bı, Áo x Bo), and exponential (A1, A2) > (Bı, B2) = 
(Ay > (Bı x (Ag —o Bz)), Ai > Bə). 


Proof. We have (natural) bijections 
XcL((A1, A2), (1,1)) = C(4A1,1) x £(A1)(A2,1) S12 x 121 { £ terminal in C and £L(A1) ) 


SeLl((A1, A2), (Bı x C1, Bo x C2)) = C(A,, Bı x C1) x L(Aı)(42, B2 x C2) 
= C(A1, B1) x C(Ai, C1) x L(A1)(A2, B2) x £(A1)(A2, C2) { x product in C and £(A1) } 
= XeL((A1, A2), (B1, B2)) x XeL((A1, A2), (C1, C2)) 


ScL((A1, A2) x (Bı, Be), (C1, C2)) = ScL((4ı x Bi, Ao x Bo), (C1, C2)) 

= C(A x Bı, C1) x L(A x Bi)(A2 x B2, C2) 

~C(A; x Bı, C1) x L(Aı x Bı)(A2, C2) x L(Aı x Bı)(B2, C2) { x coproducts in L(Aı x B1) } 
= C(A; x Bı, C1) x L(41)(A2, By > C2) x L(A; x B1)(B2, C2) st 
~C(A; x Bı, C1) x L(A1)(Az, Bi > C2) x C(Aı x Bi, B2 — C2) { Cartesian —o-typ. \ 
= C(A; x By, Cy x (By — C2)) x £(A;)(Aa, By > Cy) s me ] 
= C(Ay, By > (C1 x (By — C2))) x L(4A1)(42, Bi > C2) kepada 4 


XA 


L((A1, A2), (Bi => (C1 x (Bz — C2)), Bı => C2)) 
L((A1, A2), (Bı, B2) = (C1, C2)). 


We observe that we need £ to have biproducts (equivalently: to be CMon 
enriched) in order to show Cartesian closure. Further, we need linear =-types 
and Cartesian —o-types to construct exponentials. 


Proposition 3. XcL? has terminal object 1 = (1,1), binary product (Ai, A2) x 


(Bı, B2) = (Ay x By, Ao x Bo), and exponential (Aj, A2) => (Bı, B2) = 
(Ay > (Bı x (Bə —o Aə)), lA & Bə). 
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Proof. We have (natural) bijections 
Sic Ll? ((Aj, A2), (1, 1)) = C(Aı,1) x L(Aı)(1, A2) STX 1 


Sc Ll? ((Aj, A2), (Bı x C1, Bo x C2)) = C(A, Bı x Cı) x L(A1)(Bo x C2, A2) 
= C(A, Bi) x C(A, C1) x L£(A,)(Bo, A2) x L£(A1)(C2, A2) 
= Yel? ((Ai, A2), (Bi, Ba)) x Hcl? ((A1, A2), (Ci, C2)) 


Sc Ll°? ((Ai, A2) x (Bi, B2), (C1, C2)) = Vel?” ((Ar x Bi, A2 x B2), (C1, C2)) 

(Ay x By, C1) x L(A x Bi)(C2, A2 x B2) 

(Ay x Bi, C1) x L(A x Bi)(C2, A2) x L(A x Bi) (C2, B2) 

(Ay x Bi, C1) x C(A x Bi, C2 — B2) x L(Aı x Bi)(C2,A2) { rtesian 
(Ay x Bi, C1 x (C2 — B2)) x L(Aı x By) (Co, A2) { is prodi 
(Ay, By = (C1 x (C2 — B2))) x L(Aı x Bi) (C2, A2) 

(Ay, By = (C1 x (C2 — B2))) x L(A1)(!B1 8 C2, A2) 

SicLl??((Aj, A2), (Bi => (Ci x (C2 — B2)), !B1 Q C2)) 

XcL?((A1, Az), (Bi, Bo) > (C1, C2)). 


Qaaaaaana 


He WW We Wet 


Observe that we need the biproduct structure of £ to construct finite prod- 
ucts in Xcel”. Further, we need Cartesian —o-types and !(—) ® (—)-types, but 
not biproducts, to construct exponentials. 


7 Novel AD Algorithms as Source-Code Transformations 


As UesynLSyn and YcsynLSyn” are both Cartesian closed categories by 86, 
the universal property of Syn yields unique structure-preserving macros, B(— ): 
Syn > ScsynLSyn (forward AD) and D(— ) : Syn > XcsynLSyn™” (reverse 
AD), once we fix a compatible definition for the macros on real” and basic 
operations op. By definition of equality in Syn, XcsynLSyn and YesynLSyn™”, 
these macros automatically respect equational reasoning principles, in the sense 
that t  s implies that B (t) £ B(s) and D (P29 (8). 

We need to choose suitable terms Dop(a;y) and Dop* (z; y) to represent the 
forward- and reverse-mode derivatives of the basic operations op E€ Opp, ,...,n,: For 
example, for elementwise multiplication (*) € OPh ns we can define D(*)(x; y) = 
(fst x) x (snd y) + (snd x) * (fst y) and D(*) (x;y) = ((snd x) * y, (fst x) * y), 
where we use (linear) elementwise multiplication (*) € LOp;.,,. We represent 
derivatives as linear functions. This representation allows for efficient Jacobian- 
vector/adjoint product implementations, which avoid first calculating a full Ja- 
cobian and next taking a product. Such implementations are known to be im- 
portant to achieve performant AD systems. 


B(real”), ac real” B(real”), ae peal” D(real”); ac real” B(real”), df peal” 
B(op) f op B(op)2 ee: real”! x..xreal”*; y : real”! *..*real”* t+ Dop(a; y) : real’ 


D (op)ı df op D (op)2 dfg real”! *..«real”*: y : real” H Dop’ (x;y) : real” *..*real”* 
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For the AD transformations to be correct, it is important that these derivatives 
of language primitives are implemented correctly in the sense that 


[x;y H Dop(x;y)] = D[op] — [x;y F Dop*(x;y)] = Dlfop]’. 


In practice, AD library developers tend to assume the subtle task of correctly im- 
plementing such derivatives Dop(x; y) and Dop‘ (x; y) whenever a new primitive 
operation op is added to the library. 

The extension of the AD macros B and D to the full source language are now 
canonically determined, as the unique Cartesian closed functors that extend the 
previous definitions, following the categorical structure described in §6. Because 
of the counter-intuitive nature of the Cartesian closed structures on YcsynLSyn 
and egynLSyn”, we list the full macros explicitly in [36, Appx. A]. 


8 Proving Reverse and Forward AD Semantically Correct 


In this section, we will show that the source code transformations described in §7 
correctly implement mathematical derivatives. We make correctness precise as 
the statement that for programs x : T F t: o between first-order types T and ø, 
i.e. types not containing any function type constructors, we have that [B (t)2] = 
D|t] and [Ð (t)2] = (D[#])’, where [—] is the semantics of §5. The proof mainly 
consists of logical relations arguments over the semantics in S’pigDiffceyy and 
pie Diffem”. This logical relations proof can be phrased in elementary terms, 
but the resulting argument is technical and would be hard to discover. Instead, 
we prefer to phrase it in terms of a categorical subsconing construction, a more 
abstract and elegant perspective on logical relations. We discovered the proof by 
taking this categorical perspective, and, while we have verified the elementary 
argument (see [36, Appx. D]), we would not otherwise have come up with it. 


8.1 Preliminaries 


Subsconing Logical relations arguments provide a powerful proof technique for 
demonstrating properties of typed programs. The arguments proceed by induc- 
tion on the structure of types. Here, we briefly review the basics of categorical 
logical relations arguments, or subsconing constructions. We restrict to the level 
of generality that we need here, but we would like to point out that the theory 
applies much more generally. 

Consider a Cartesian closed category (C,1, x, =). Suppose that we are given 
a functor F : C — Set to the category Set of sets and functions which preserves 
finite products in the sense that F(1) S 1 and F(C x C’) S F(C) x F(C"). 
Then, we can form the subscone of F, or category of logical relations over F, 
which is Cartesian closed, with a faithful Cartesian closed functor mı to C which 
forgets about the predicates [24]: 


— objects are pairs (C, P) of an object C of C and a predicate P C FC; 
— morphisms (C, P) + (C’,P’) are C morphisms f : C — C’ which respect 
the predicates in the sense that F'(f)(P) C P’; 
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— identities and composition are as in C; 

— (1, F1) is the terminal object, and products and exponentials are given by 
(C, P)x(C", P’) = (CxC’, {a € F(C x C") | F(a) (a) € P, F(m2)(a) € P'}) 
(C, P) = (C', P') = (C = C, {F(m)(y) | ye F(C = C") x C) s.t. 
F'(m2)(y) € P implies F(ev)(y) € P’}). 


In typical applications, C can be the syntactic category of a language (like 
Syn), the codomain of a denotational semantics [—] (like Diff), or a product of 
the above, if we want to consider n-ary logical relations. Typically, F tends to be 
a hom-functor (which always preserves products), like C(1,—) or C(Co,—), for 
some important object Co. When applied to the syntactic category Syn and F = 
Syn(1,—), the formulae for products and exponentials in the subscone clearly 
reproduce the usual recipes in traditional, syntactic logical relations arguments. 
As such, subsconing generalises standard logical relations methods. 


8.2 Subsconing for Correctness of AD 


We will apply the subsconing construction above to 
C = Diff x pir Diffom F = Diff x Spire Diffom ((R, (R, R)), —) (forward AD) 
C = Diff x “pir Difem” F = Diff x Spit Diffom” ((R, (R, R)), —) (reverse AD), 
where we note that Diff, pig Diffcm, and Ypie Diffem™” are Cartesian closed 
(given the arguments of §5 and §6) and that the product of Cartesian closed cat- 
egories is again Cartesian closed. Let us write SScone and SScone, respectively, 
for the resulting categories of logical relations. 

Seeing that SScone and SScone are Cartesian closed, we obtain unique Carte- 
sian closed functors (—)f : Syn — SScone and (—)" : Syn — 8Scone once 
we fix an interpretation of real” and all operations op. We write Pf and P7, 
respectively, for the relations 72(r)/ and 7o(7)". Let us interpret 


(real”)/ Œ (R°, (R”,R”)), {(f, (g,h)) | f = g and h = Df})) 
real)” © (((R”, (R",R")), {(f, (g, h)) | f = g and h = (Df)*})) 
(op)! © (Jop], (TB (op):].[B(op)2])) op)” © (lop), (IP (op): FD (op) 2])), 


where we write Df for the semantic derivative of f (see §5). We need to verify, 


respectively, that ([op], (IÐ (op):], [B(op)2])) and ([op], (Ð (op), [Ð (op) 2])) 
respect the logical relations Pf and P”. This respecting of relations follows 
immediately from the chain rule for multivariate differentiation, as long as we 
have implemented our derivatives correctly for the basic operations op: 


[z;y + Dop(«; y)] = Dlop] and [z; y + (Dop)*(x;y)] = (D[op])*. 


ra def def 
Writing real™»?”r = real”! *..*real”* and R10”: = R™ x..xR"*, we compute 


(real "pf = (RMN, (RM, BEM), {(f,(g,h)) | f = gh = Df}) 
fread "pr = (Rm, (RM, R™)), {(F,(9,)) | f= 9,4 = (DAY'S) 
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since derivatives of tuple-valued functions are computed component-wise. (In 
fact, the corresponding facts hold more aa E a a first-order type, as an 
iterated product of real”.) Suppose that (f, (g, h Poari ong Le. g = f and 
h = Df. Then, using the chain rule in the last -i we have 


(F, (9h); (Lop), (IŻ (op)11; IB (op)2])) = (F, CF, DA); (opl, (opl, Ie; y E Dop(a; y)})) 
= (f, (£, DF)); (lop, opl, Plop])) = (f; Top]. (£; lop]. £ = r => Diop] (£ (=) (Df) 
= (f; [oP]; (f; [op]; D(F; [op]))) € Phar- 


Similarly, if (f, (g, h)) € P7, 


real”1»: 


(F, (9; h)); (opl, (IP (op)1], [P (op)2])) = (F, G, (DA); dopl, opl, [2s y  (Dop)'(a;y)]) = 
(£, (F, DF); (opl, opl, (Plop])")) = (f: Lop]. (fs Lop], 2 > v > Df'(x)(Dfop]'(f(a))(v)))) = 
(F; [op], (f; [op], £ > v + (Df (x); Dlop](f(2)))"(v))) = (fs Lop] (f; Lop], (PE; [op]))')) € Pisar- 


ang, then by the chain rule and linear algebra 


Consequently, we obtain our Cartesian closed functors (—)/ and (-)”. 

Further, observe that Xj- l-] (t1, t2) E ([t1], [t2]) defines a Cartesian closed 
functor Xj-jl-] : YcsynLSyn > XpiæDiffcm. Similarly, we get a Cartesian 
closed functor 2;_)[—]°? : YcsynLSyn® > SpigDiffeom’”. As a consequence, 
the two squares below commute. 


i B) id,D 
Syn SE Syn x YosynLSyn Syn nee hy Syn x YcsynLSyn” 
| [een r| [eae 
SScone a Diff x pig Diffom Scone =a Diff x “pie Diffem”. 


Indeed, going around the squares in both directions define Cartesian closed func- 
tors that agree on their action on real” and all operations op. So, by the universal 
property of Syn, they must coincide. In particular, ([t], (2 1], [B (©2])) is a 
morphism in SScone and therefore respects the logical relations Pf for any well- 
typed term t of the source language of §3. Similarly, (ft), (1P (1), [Ð (&)2])) is a 
morphism in SScone and therefore respects the logical relations P”. 

Most of the work is now in place to show correctness of AD. We finish the 
proof below. To ease notation, we work with terms in a context with a single 
type. Doing so is not a restriction as our language has products, and the theorem 
holds for arbitrary terms between first-order types. 


Theorem 1 (Correctness of AD). For programs x : Tt t: o between first- 
order types T and o, 
<< 
Bols Oa=PK POdl= Ée] = DE, 
( 


where we write D and as for the usual calculus derivative and matrix transpose. 


Proof (sketch, see [36, Appz. B] for details). To show that [B(t):](x) = [é](2) 
and [B(t)2](x)(v) = D[t](x)(v), we choose a smooth curve y : R > [r] such 
that (0) = 0 and Dy(0)(1) = v and use that t respects the logical relations P/. 

To show that [D(t):](x) = ft] (<) and [Ð (t)2](x)(v) = Dft] (x)* (v), we choose 
smooth curves 7; : R > [7] such that 7;(0) = z and +;(0)(1) = e;, for all stan- 
dard basis vectors e; of [D(7)2] = R”. It now follows that [D(t),](x) = [EJ (2) 
and e;¢['D(t)2] (a)(v) = e;* DIt] (£) (v) as t respects the logical relations P”. 
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Pr t:Dom(lop) (lop E€ LOP% npin, ant) Prt:r Phsit 


sects 


I F lop(t) : LFun(LDom(lop), real”) TRO: Crtt+. 8:7 


p 


It: LFun(7,o) (Fs: LFun(o,p) Tt t:LFun(r,o) Cbs: 
T F- lid : LFun(r,7) T F t;s: LFun(r, p) IT- lapp(t, s): o 


[EK t:7 + LFun(o, p) Petr 
IF lswapt:LFun(o,t > p) T F leval; : LFun(7 > 9,0) 


TEST Tr Et:rt—> LFun(o, p) 
T} {(t,—)} : LFun(o, Tens(7,0)) TH lcur™'t: LFun(Tens(r,o),p) TF lfst : LFun(r*o,7) 


T Ht:LFun(r,o) I s: LFun(r, p) 
IF lsnd : LFun(T*0, 0) T - lpair(t, s) : LFun(r, op) 


Fig. 5. Typing rules for the applied target language, to extend the source language. 


9 Practical Relevance and Implementation 


Popular functional languages, such as Haskell and O’Caml, do not natively sup- 
port linear types. As such, the transformations described in this paper may seem 
hard to implement. However, as we summarize in this section (and detail in [36, 
Appx. C]), we can easily implement the limited linear types needed for the trans- 
formations as abstract data types by using merely a basic module system. 
Specifically, we consider, as an alternative, applied target language for our 
transformations, the extension of the source language of §3 with the terms and 
types of Fig. 5. We can define a faithful translation (—)' from our linear target 
language of §4 to this language: define (!7 @ a)! Lf Tens(rÝ,at,), (z  a)t = 
LFun(z',o*), (real”)i df real” and extend (—)* structurally recursively, letting 


it preserve all other type formers. We then translate (£1 :7,...,@,:T)y:al t: 


pt E r1: Tt., En itt H tt: (a p)t and (TiTe En i THE: o Sa: 


Tİ,...,£n : T? FH tl : ot. We believe an interested reader can fill in the details. 
This exhibits the linear target language as a sublanguage of the applied target 
language. The applied target language merely collapses the distinction between 
linear and Cartesian types and it adds the constructs lapp(t, s) for practical 
usability and to ensure that our adequacy result below is meaningful. 

We can implement the API of Fig. 5 as a module that defines the abstract 
types LFun(7, c), under the hood implemented as a plain function type T > Øg, 
and Tens(7,c), which is implemented as lists of pairs List(r»ø). Then, the 
required terms of Fig. 5 can be implemented as follows, using standard idiom 
[], t :: s, fold op over z int from acc = init for empty lists, cons-ing, and folding: 


0, =() t4+15=() Oreo = (07,0) t+rao 8 = (fst t +, fst s,sndt +, snd s) 
0,46 = à- ttre 8 =AVAE +o ST OpFun(z,c) = AO t+bPun(7,c) S = ÀAT.t L +o ST 
OTens(7,0) = [] t+ens(r,0) $ Tf fold z :: accover x int from acc = s 


lid @ daw t;s ef g.s (tx) lapp(t, s) ets Iswap t = Av.Ay.tyxr leval df pat 
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{(t, —)} af dat, 2) = [] lcur a \ x fold t (fst x) (snd x) + accover z in z from acc = 0 


lfst “ \rfstx Isnd & \z.snd x Ipair(t, s) def Ax. (tx, 8x) 


Our denotational semantics extends to this applied target language and is ad- 
equate with respect to the operational semantics induced by the suggested im- 
plementation. Further, our correctness proofs of the induced source-code trans- 
lations also transfer to this applied setting, and they can be usefully phrased 
as manual, extensible logical relations proofs. As an application, we can extend 
our source language with higher-order primitives, like map € Syn((real > 
real)*real”, real”) to “map” functions over the black-box arrays real”. Then, 
our proofs extend to show that their correct forward and reverse derivatives are 
B(map):(f,v) = map(f; fst , v) B(map)2(f, v)(g, w) = map gv + zipWith(f;snd ) vw 
D(map);(f,v) ac map(f;fst,v) D(map)2(f,v)(w) ae (zip v w, zipWith (f;snd) vw), 
where we use the standard functional programming idiom zip and zipWith. 
Here, we can operate directly on the internal representations of LFun(r,c) and 
Tens(7, c), as the definitions of derivatives of primitives live inside our module. 


10 Related and Future Work 


Related work This work is closely related to [20], which introduced a simi- 
lar semantic correctness proof for a version of forward-mode AD, using a sub- 
sconing construction. A major difference is that this paper also phrases and 
proves correctness of reverse-mode AD on a A-calculus and relates reverse-mode 
to forward-mode AD. Using a syntactic logical relations proof instead, [5] also 
proves correctness of forward-mode AD. Again, it does not address reverse AD. 

[11] proposes a similar construction to that of §6, and it relates it to the dif- 
ferential A-calculus. This paper develops sophisticated axiomatics for semantic 
reverse differentiation. However, it neither relates the semantics to a source- 
code transformation, nor discusses differentiation of higher-order functions. Our 
construction of differentiation with a (biadditive) linear target language might 
remind the reader of differential linear logic [15]. In differential linear logic, (for- 
ward) differentiation is a first-class operation in a (biadditive) linear language. 
By contrast, in our treatment, differentiation is a meta-operation. 

Importantly, [16] describes and implements what are essentially our source- 
code transformations, though they were restricted to first-order functions and 
scalars. [37] sketches an extension of the reverse-mode transformation to higher- 
order functions in essentially the same way as proposed in this paper. It does 
not motivate or derive the algorithm or show its correctness. Nevertheless, this 
short paper discusses important practical considerations for implementing the 
algorithm, and it discusses a dependently typed variant of the algorithm. 

Next, there are various lines of work relating to correctness of reverse-mode 
AD that we consider less similar to our work. For example, [28] define and prove 
correct a formulation of reverse-mode AD on a higher-order language that de- 
pends on a non-standard operational semantics, essentially a form of symbolic 
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execution. [2] does something similar for reverse-mode AD on a first-order lan- 
guage extended with conditionals and iteration. [8] defines an AD algorithm in a 
simply typed A-calculus with linear negation (essentially, the continuation-based 
AD of [20]) and proves it correct using operational techniques. Further, they 
show that this algorithm corresponds to reverse-mode AD under a non-standard 
operational semantics (with the “linear factoring rule”). These formulations of 
reverse-mode AD all depend on non-standard run-times and fall into the cat- 
egory of “define-by-run” formulations of reverse-mode AD. Meanwhile, we are 
concerned with “define-then-run” formulations: source-code transformations pro- 
ducing differentiated code at compile-time, which can then be optimized during 
compilation with existing compiler tool-chains. 

Finally, there is a long history of work on reverse-mode AD, though almost 
none of it applies the technique to higher-order functions. A notable exception 
is [31], which gives an impressive source-code transformation implementation of 
reverse AD in Scheme. While very efficient, this implementation crucially uses 
mutation. Moreover, the transformation is complex and correctness is not con- 
sidered. More recently, [38] describes a much simpler implementation of a reverse 
AD code transformation, again very performant. However, the transformation is 
quite different from the one considered in this paper as it relies on a combina- 
tion of delimited continuations and mutable state. Correctness is not considered, 
perhaps because of the semantic complexities introduced by impurity. 

Our work adds to the existing literature by presenting (to our knowledge) 
the first principled and pure define-then-run reverse AD algorithm for a higher- 
order language, by arguing its practical applicability, and by proving semantic 
correctness of the algorithm. 


Future work We plan to build a practical, verified AD library based on the 
methods introduced in this paper. This will involve calculating the derivative of 
many first- and higher-order primitives according to our method. 

Next, we aim to extend our method to other expressive language features. 
We conjecture that the method extends to source languages with variant and 
inductive types as long as one makes the target language a linear dependent type 
theory [10,34]. Indeed, the dimension of (co)tangent spaces to a disjoint union 
of spaces depends on the choice of base point. The required colimits to interpret 
such types in XeL and Xcel” should exist by standard results about arrow and 
container categories [3]. We are hopeful that the method can also be made to 
apply to source languages with general recursion by calculating the derivative of 
fixpoint combinators similarly to our calculation for map. The correctness proof 
will then rely on a domain theoretic generalisation of our techniques [35]. 
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Abstract. Higher-order functions have become a staple of modern pro- 
gramming languages. However, such values stymie concolic testers, as 
the SMT solvers at their hearts are inherently first-order. 

This paper lays a formal foundations for concolic testing higher-order 
functional programs. Three ideas enable our results: (i) our tester con- 
siders only program inputs in a canonical form; (ii) it collects novel 
constraints from the evaluation of the canonical inputs to search the 
space of inputs with partial help from an SMT solver and (iii) it col- 
lects constraints from canonical inputs even when they are arguments to 
concretized calls. We prove that (i) concolic evaluation is sound with re- 
spect to concrete evaluation; (ii) modulo concretization and SMT solver 
incompleteness, the search for a counter-example succeeds if a user pro- 
gram has a bug and (iii) this search amounts to directed evolution of 
inputs targeting hard-to-reach corners of the program. 


1 Introduction 


Concolic testing allows symbolic evaluation to leverage concrete inputs as 
it attempts to uncover bugs. The role of concrete inputs is twofold. First, they 
help symbolic evaluation focus on one control-flow path at a time, thus allowing 
the exploration of the behavior of a user program in an incremental and directed 
fashion. Second, they enable concretization, permitting symbolic evaluation to 
seamlessly switch to concrete evaluation and back, thus facilitating interoper- 
ability with external libraries. Testament to the success of concolic testing is 
adaptations to a gamut of linguistic, Pa T application settings a (6 [7 
c paee pa Be akaa DER the power of SMT 
solvers. That is, at the end of a run of a user program, the concolic tester con- 
structs a formula whose solution determines the next input. Alas, SMT solvers 
largely deal with first-order formulas that cannot capture higher-order properties 
of inputs. As a result, existing concolic testers struggle with JavaScript, Python 
or Racket components whose inputs are often higher-order functions and fall 
back to incomplete approximations [36]. 

The goal of this paper is to introduce provably correct foundations that 
lift concolic testing to the world of higher-order functions. 


© The Author(s) 2021 
N. Yoshida (Ed.): ESOP 2021, LNCS 12648, pp. 635-663, 2021. 
https://doi.org/10.1007/978-3-030-72019-3_23 
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call-twice = Af. let i=f (equals 2) in 


let j = f (equals 30) in error-trigger = 

let k = f (equals 7) in Ag. (cond [(g 2) 12] 

(cond [!(i= 12) 1] [(g 30) 5] 
[!(j =5) 2] [else -2]) 
[!(k =-2) 3] 


[else error]) 


Figure 1: One Argument Call Is Not Enough; Example & Error-Triggering Input 


There are three interdependent challenges for the design of a correct higher- 
order concolic tester. First, a higher-order concolic tester needs to be able to 
generate sufficiently complex function inputs to explore the behavior of a user 
program. Even in simple higher-order programs, this set of inputs includes func- 
tions with sophisticated structure. The left-hand side offfigure I|displays one such 
program, call-twice. It consumes a higher-order function f that when given a 
predicate on numbers returns a number. It calls f with three different predicates 
that return true if their input is 2, 30 and 7 respectively. If the result of any 
of these calls is different than a specific number, call-twice terminates suc- 
cessfully; otherwise call-twice errors. Hence, only a fine-tuned input can make 
call-twice error. In particular, it has to be a function that calls its argument 
at least twice with different numbers and returns the right result in each case, 
like the counterexample on the right-hand side of the figure. 

The second challenge is that a higher-order concolic tester needs to be able to 
generate structurally complex function inputs in a directed manner. Specifically, 
to preserve the character of first-order concolic testing, a higher-order concolic 
tester must start with a default input that evolves, with each run of the user 
program and the help of an SMT solver, to a new input that aims to exercise 
a previously unexplored region of the program. Returning to the example from 
a higher-order concolic tester should start from a simple f such as a 
constant function and then use hints from the evaluation of the example to add 
appropriate calls inside f that call f’s argument, targeting the last branch of 
call-twice’s cond expression. 


date< = Ad1. àd2. (or ((date-year d1)<(date-year d2)) 
((date-month d1)<(date-month d2)) 
((date-day d1)<(date-day d2))) 


main = Adates. (let sorted-dates = (sort dates date<) in °°") 


Figure 2: Broken Argument for a Library Function. 


The third challenge is that, in a higher-order setting, concretization demands 
that the concolic tester is ready to concretize any call to a higher-order function. 
For example the main function in [figure 2|takes as input a list of dates, calls sort 
with the comparison function date< and expects the results to be lexicograph- 
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ically sorted (as there are many reasons why sorting is necessary we leave the 
details to the imagination of the reader). If sort is a library function whose im- 
plementation is inaccessible, then the concolic tester has to concretize the call to 
sort and disable symbolic evaluation for the extent of that call. Unfortunately, 
date< does not implement the lexicographical order and discovering this requires 
the concolic tester to track symbolically the flow of values in and out of date< 
in order to generate a list of dates that exhibits the bug. In other words, the 
concolic tester should be able to perform “partial” concretization so that date< 
interacts with sort in a concrete manner while the the evaluation of date< still 
produces the symbolic information the tester needs. 


Our paper contributes the first formal model for a concolic tester for 
higher-order functions that meets all three challenges: 


1. Inspired by the function application rules of unknown symbolic values in 
higher-order symbolic evaluation , we construct a novel set of canoni- 
cal functions that the concolic tester uses to generate inputs. We prove that if 
a higher-order program under test errors for some input, there is a canonical 
input that triggers an error too (representation completeness). 


2. We devise input constraints to record at runtime facts about the structure 
of the generated function inputs separately from the first-order control flow 
path formulas from the symbolic evaluation of the user program. We spec- 
ify an input evolution process that captures how the concolic tester can use 
input constraints to iteratively search through the space of canonical func- 
tions with the help of an SMT solver. We establish that, relative to the 
completeness of the solver, the concolic tester can always start with a de- 
fault input and, through evolution, generate a counter-example, if one exists 
(search completeness). Furthermore the input evolution is directed by the 
input constraints that the concolic tester collects (directness). 


3. Building on top of higher-order contracts [6], we develop concretization 
that employs wrappers around higher-order functions that are consumed by 
library and other inaccessible code. The wrappers allow the concolic tester 
to maintain control of function inputs and evaluate their bodies symbolically 
while producing concrete values when they interact with code that the con- 
colic tester does not control. We prove that, in the presence of concretization, 
the search for the bug is not complete but the concolic tester still evaluates 
user programs consistently with respect to concrete evaluation (soundness). 


The remainder of the paper is organized as follows. gives an in 
depth by-example presentation of our approach to higher-order concolic testing. 
Section 3] presents our formal model and [section 4] establishes its correctness 
properties. [Section 5]describes a proof-of-concept implementation of our model 
that provides evidence that the model is a reasonable basis for the development 
of effective higher-order testers. Finally, section 6}places our results in the context 
of related work and fection 7] offers some concluding thoughts. 
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2 Higher-order Concolic Testing by Example 


The linguistic setting of our exposition of concolic testing is a small call-by-value 
dynamically-typed functional language without mutable state. Furthermore, we 
represent bugs explicitly as the term error and assume that user programs come 
with type-like input specifications. 


2.1 First-Order Concolic Execution in a Nutshell 


The goal of a concolic tester is to find a value for the inputs to a user program 
that cause the execution to reach error. To do so, the tester runs the user pro- 
gram in a concolic loop with a different input for each loop iteration. There are 
two differences between concolic evaluation and concrete evaluation. To explain 
them, consider the user program in the left-hand column of where X 
represents the numeric input. 


(cond Input: Input: 

[(X*X-X-992=0) | xno X32 
(cond 

[(X <0) error] Log: Log: 

[else 12])] e cond: (false) e cond: (true) 
[else 11]) <XxX - X - 992 =0> <XxX - X - 992 =0> 

e cond: (false) 
<X<0> 


Figure 3: A First, First-order Concolic Example 


The first difference is that, instead of concrete values, concolic evaluation 
utilizes values of the form <t>, where t is a first-order formula over the input 
variables that codifies the provenance of the value. Concretely, assume that in 
the first run of our example program the concolic tester picks the concrete input 
0. Instead of just starting the evaluation of the program by replacing X with 
0, the concolic machine keeps an environment that maps X to 0 and runs the 
program with the concolic value <X> as the input. The concrete counterpart of 
a concolic value can be computed from the concrete values in the environment 
and the (first-order) formula t at any point during concolic evaluation. 

To kick-off concolic evaluation, the concolic machine evaluates the test ex- 
pression of the outer cond of the example. Specifically the primitive operation ~ 
detects that its input is <K> and returns <XxX>. Even though the concrete coun- 
terparts of both of these concolic values are 0, they bear a different relation to 
the input X. The concolic machine proceeds with the rest of the evaluation of the 
test expression, yielding <XxX-X-992=0>. At this point, the concolic machine 
uses the concrete counterpart of the concolic value and thus decides to follow 
the “else” branch of the outer cond. Hence, the first run does not trigger error. 
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The second characteristic of concolic evaluation are the connections it creates 
between the inputs and the evaluation of a user program. Specifically, the con- 
colic machine logs the concolic value of the test expressions of cond expressions 
in the user program in the order they are evaluated; we refer to these entries of 
the log as path constraints. The middle column of [figure 3]shows the log (and the 
inputs) for the run of our example when X is 0. Since only one cond expression is 
evaluated, the log contains a single path constraint that the concrete counterpart 
of the concolic value <X*X -X - 992 =0> is false, that is the first branch of the cond 
was not taken. Intuitively, the path constraint connects the evaluation of a cond 
expression with the input to the program via concolic values. After the first run, 
the concolic tester asks the SMT solver for an input where XxX -X - 992 =0 holds, 
forcing the branch to go the other way. The SMT solver may respond with 32, 
leading to the run represented in the right-hand column of That run 
again fails to trigger the error, but has a log showing that the first branch of 
the outer cond was taken this time because XxX -X - 992=0 is true. It also has 
another constraint that indicates that the first branch of the inner cond was not 
taken because X<0 is false. At this point, the concolic tester can formulate a 
new SMT problem that requires both XxX -X -992=0 and X<0 to be true. The 
problem is satisfiable and the SMT solver replies that the new concrete value for 
X should be -31, which uncovers the error. 


2.2 From Numbers to Function Inputs 


As described so far, concolic testing cannot handle inputs that are not numbers 
or other data types that SMT solvers understand. The concolic tester relies solely 
on a solver to generate new inputs and for that it needs to prepare a first-order 
problem that the solver can solve. Our first insight to surpass this restriction is 
to split the generation of function inputs into two subproblems: 


1. testing programs with first-order function inputs and; 
2. testing programs with higher-order function inputs. 


As with many problems that involve higher-order functions, the first subproblem 
is the hard one. The solution for the second subproblem falls out of that for the 
first one, exploiting the natural co- and contravariance of higher-order functions. 
So, we first focus on first-order function inputs and we return to higher-order 
inputs in 

The left-hand column in [figure 4|shows a program whose input F is a first- 
order function from numbers to numbers. One of the many functions that can 
trigger error in this example is àx. 2-x. However, a key aspect of our approach 
is recognizing that we care only about the behavior of the input when given 1 
and 2. Since the program calls F with only those arguments, other arguments are 
irrelevant. In general, any program that terminates calls its input a finite number 
times so the concolic tester can model first-order function inputs as functions 
that look up values from a table, which we represent with a case expression. 

As with non-function inputs, the concolic tester starts with the simplest 
possible function input: Àx. (case x), as shown in the middle column ofligure 4] 
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(cond Input: Input: 

[(F 1) x 3=(F 2) +3) F Àx. (case x) Fu YO 
error] Ax.(casex 7,9 
[else 11]) Log: [1 <Y>] 

e call: (F 1) [2 <Z>]) 

e call: (F 2) Log: 

e cond: (false) |. call: (F 1) 

es = Uae e call: (F 2) 
e cond: (false) 
<Yx3 =Z+3> 


Figure 4: First-order Input 


This function looks up its argument in an empty table and returns always 0. If 
the concolic machine treated this function as a first-order input, it would record 
that the first branch of cond was not taken because <(F 1) x 3=(F 2)+3> is false. 
This formula, however, involves function symbols which SMT solvers cannot 
handle when higher-order functions come into play. Thus the concolic machine 
does not record the constraint and instead simply reduces all applications of F 
en route to the concolic value of the test expression. Unfortunately, this first 
function input does not help the concolic tester make progress. Since the input 
returns the constant 0 for any argument, the concolic value of the test loses any 
connection to F and the concolic tester does not have much leverage to adjust 
F’s behavior and affect the evaluation of the program. 

To rectify the situation our concolic tester aims to generate a new input with 
a shape that gives to the tester increased control over F’s behavior. The pivotal 
idea that enables the input evolution process is that the concolic machine logs 
so called input constraints. That is, in addition to the path constraints of the 
user program, it also records the values that the user program provides to F, or 
any other function input. Back to the example, the evaluation records two input 
constraints: one for argument 1 and one for 2. The middle column of figure 4] 
shows the new log entries along with the path constraint from the evaluation of 
the cond expression. 

With the input constraint from the log, the concolic tester can construct a 
second function input as shown in the right-hand column of This new 
function input has a case expression with two clauses: one for when the argument 
is 1 and one for when it is 2. Furthermore the concolic tester introduces two fresh 
input variables Y and Z as the actions of the two clauses. The initial values for 
these two new inputs are both 0. However, exactly because the results of the 
function are input variables rather than mere constants, the concolic tester can 
configure the values for these inputs to trigger the error with the help of an SMT 
solver. Specifically, the concolic value of the test of the first branch of the cond 
expression in the example becomes <Yx3 = Z+3>, as shown in the log. This problem 
has solutions and the SMT solver discovers that Y=1 and Z=0 are sufficient to 
“switch” the evaluation of the conditional, which triggers the error. 
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In sum, to handle first-order function inputs, the concolic tester starts with 
the simplest possible function, records input constraints that describe the ar- 
guments that the function consumes, uses the constraints to generate a new 
function that, in turn, introduces fresh inputs, and finally employs the SMT 
solver to fine-tune the values for these inputs. 

As a final remark in this section, function inputs are regular functions that 
behave like a concrete input would behave. For the concolic machine though, the 
evaluation of their bodies is a source of new information that powers the subse- 
quent iterations of the concolic loop. This is a key observation for concretization 
in the our setting. A concolic tester concretizes calls to functions when it can- 
not evaluate their bodies in a concolic manner. This situation arises when the 
function comes from an external library, such as sort from and the 
function’s code is not under the control of the concolic machine. In the context 
of this section, this translates to the situation where the function’s body cannot 
interact with any concolic values nor can its evaluation record path constraints 
in the log of the machine. A naive solution to the issue is that the concolic 
machine computes the concrete counterpart of the argument, delegates the call 
of the function to a concrete machine and then uses the result of the concrete 
call to proceed. This means, however, that the concolic machine loses any con- 
straints from the evaluation of the body of the argument if the argument is a 
function itself. Instead, our concolic machine uses a proxy argument for the con- 
crete call that wraps the actual argument. Thus calls to the argument go back to 
the concolic machine that records all the usual constraints and only concretizes 
any first-order results the argument produces. We return to our approach to 
concretization in 


2.3 Input Interactions 


The previous example supplies a constant number to F. However, programs can 
also supply other, first-order inputs to their function inputs, as in the example 


in the left-hand column of 


(cond Input: Input: 
[(F X) x 3 = (F (X=2)) +3) Fh dx. (case x) Fr Xl 
error] Xb0 Ax.(case x Y0 
[else 11]) [<X> <Y>] 7 
Log: [<Xx2> <Z>]) >00 


e call: (F <X>) Log: 
e call: (F <Xx2>) © call: (FX) 


e cond: (false) © call: (F <Xx2») 
ee e cond: (false) 


<Y x3 = Z+3> 


Figure 5: Interacting Inputs 
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In order to trigger the error in this example, F must be able to return different 
results from its two different calls. However, if the initial concrete value for X is 0, 
the concrete counterparts of the arguments to the calls to F are the same for both 
calls. Thus, if the concolic machine logs only the concrete counterparts of the 
arguments as part of input constraints, the concolic tester loses the connection 
between X and the values that the user program passes to F. Instead, the concolic 
machine uses the concolic values when logging input constraints. As shown in 
the log in the middle column of the concolic values of the arguments to 
the two calls to F are <X> and <X~x2>. Thus, the concolic tester can extend the case 
of F with two clauses, one for when the concrete counterpart of the argument 
of F matches that of <K> and one when it matches that of <Xx2>. The effect of 
this extension is that any problems the concolic tester sends to the SMT solver 
contain the additional constraint that X and Xx2 are different. Consequently, in 
a manner similar to the previous example, the concolic tester eventually uses 1 
as the concrete value for X and discovers the error. The right-hand column of 


displays this counter-example. 


2.4 Blind Extensions Are Not Enough 


So far we have seen how the concolic tester uses input constraints and concolic 
values to extend the case expression of a first-order function input. However, the 
extension may lead the concolic tester to a dead-end. This is a subtle point that, 
unfortunately, requires a complex example to illustrate. [Figure 6]contains the 
simplest one we know. 

This example is complex enough that it deserves a brief walkthrough. To 
start, note that it has two inputs, F, a function from numbers to numbers, and 
X, a number, and that reaching the error requires that the tests of all of the 
branches of the cond expression of the example fail. In effect, the condition for 
triggering error is the conjunction of the four formulas that follow the negations 
in the example. To confirm that this example does have a error-triggering input, 
take X to be -10 and F to be Ax. 11 x (x+11). 

If the concolic tester follows the process described so far in this section, it 
manages to generate an input that makes the tests of the first three branches 
of cond to fail. But then, it seems impossible for the concolic tester to extend 
the input further to make the test of the fourth branch cond succeed. To see 
how this plays out, the middle column in [figure 6]shows the state of the concolic 
machine after a few iterations of the concolic loop. The concolic tester first runs 
the example with the default constant zero function as the input, which results 
in 11 and logs the argument X for F; the concolic tester then extends the case 
of F with a clause that returns a fresh concolic variable Y. It then discovers 
Y must be set to 11 to skip the first branch in the cond expression. For this 
input, the example produces result 7, failing to also skip the second branch of 
the cond expression. After another iteration, the concolic tester manages to skip 
the second branch of cond and generates the input shown in the middle column 


of figure 6 
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(cond Input: Input: 
[!(F X=11) 1] F> Xe1 Fe Xr 
[!(F 0=121) 7] Ax.(casex Z121 | Ax-(casex We l2l 
[!(F (K+10) = 121) 2] [<X> <>] KX <>] 
[1(X =-10) 9] pa oe [0 Zo] ie 
[else error]) bow [<X+10> <W>]) 11 
e call: (F <X>) Log: 
e cond: (false) e call: (F X>) 
<(Y=11)b e cond: (false) 
e call: (F 0) «(Y =11) 
e cond: (false) e call: (F 0) 
<!(Z=121)> e cond: (false) 
© call: (F <X+10>) <1(Z= 121) 
e cond: (true) e call: (F <X+10>) 
<!(0=121)> e cond: (false) 
<!(W = 121)> 
e cond: (true) 
<!(X =-10)> 


Figure 6: A Complex, Subtle Example 


Let us analyze the middle section of the figure to understand the concolic 
tester’s state at this point in the process. The input consists of a function F 
that returns <Y> when it sees the input <X>, where Y is 11 and returns <Z> when 
it sees 0, where Z is 121. When we feed this input to the program in the left- 
hand column, we skip the first and second branches of the cond, because F has 
been tuned to get through them. This part of the execution produces the first 
four entries in the log. Next the concolic machine arrives at the third branch of 
the cond and the call (F (X+10)), which produces the fifth entry in the log. The 
concrete value of the argument is 11, which has no matching clause in the case of 
F so F returns 0, and the program terminates with 2, following the fourth branch 
as recorded in the last entry in the log. 


The straightforward next step is to insist that this third call has its own 
distinct clause in F, meaning the concolic engine asks the solver for a solution 
to the equations !(X=0) and !(0=X+10). An input based on the solution to these 
equations is shown in the third column of and it too deserves a careful 
look. The log is identical up to the last “call” entry so the program evaluates 
the same to that point. The next entry in the log (second to last) reveals the 
concolic machine skips the third branch of the cond and thus proceeds with the 
evaluation of the test X=-10 of the fourth branch. Since the value for the input 
X is 1, the machine follows the branch and the program returns 9. 


Clearly, since we want the machine to skip the fourth branch too, the tester 
should present to the solver the same set of equations that lead to the latest 
input and assert in addition X=-10. Unfortunately, there is no solution to these 
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equations since they already contain !(K =-10) because the first and third clauses 
of F are distinct. 

While it is usually a good choice for the concolic tester to force the arguments 
the user program provides to function inputs to be distinct, in some cases, like 
this one, it is necessary to do otherwise. Indeed, at the very point of this example 
to be able to reach error, we need to improve the concolic tester’s capabilities. 
More precisely, the concolic tester needs to be able to take a new argument and 
force it into an existing clause rather than adding a new one. In this example, if 
the concolic tester forces the argument X+10 and the argument 0 to match the 
same clause, then it can add the equation 0=X+10 to the problem it presents 
to the SMT solver at the end of the iteration of the concolic loop described in 
the middle column of This extra equation no longer clashes with the 
necessary equation to skip the fourth branch of the user program (!(X=-10)) and 
with the help of the SMT solver, the tester can adjust the input in the middle 
column of(agure 6}to use -10 as X and trigger the error. 

To sum up, at the end of each iteration of the concolic loop there are multiple 
ways a first-order input can evolve. The concolic tester can use the logged input 
constraints to assert to the SMT solver that the arguments of a call to the input 
are different from those of some other calls and extend the case expression of 


the input accordingly (section 2.2|to [section 2.3). Or, it can assert to the SMT 
solver that the arguments of two calls to the input are equal (section 2.4). In 
either case, the concolic tester asks the SMT solver to determine the values 


of first-order inputs. We revisit formally the evolution of inputs in 
As a concluding note, we underline that the concolic tester may have to try any 
number of the possible ways an input can evolve. The strategy the concolic tester 
uses to prioritize and search the space of these possibilities is out of the scope of 
this paper. Herein, we focus instead on what the concolic tester can do at each 
point in the concolic loop and whether a sequence of its choices is guaranteed to 
reveal a possible error in a user program. 


2.5 Higher-order Inputs 


Handling higher-order inputs, that is functions that consume and/or return other 
functions, not just numbers, requires a generalization of the ideas in the previous 
section. However, the seed of the key insight is already there in the way our 
concolic tester handles first-order function inputs. Intuitively, the tester treats a 
first-order function input as a source of new, latent inputs that the concolic tester 
provides to the user program. As we discuss above this is exactly the rationale 
for the fresh input variables that appear in the actions of the case expressions 
of first-order function inputs. 

Contravariantly, when an input consumes a function argument, the tester 
can simply treat the function argument as a source of further, latent arguments 
that the user program provides. The input can decide how and when to call 
its function argument in order to obtain these latent arguments. These function 
calls, in turn, open up new points where the concolic tester supplies additional 
inputs to the user program. 
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(cond Input: Input: 
[KG on x+1)+G (Ax. x+2)=9) | Ging x | Gea let fY =f% in 
error 


(case fY [<Y+1> <X>] 


[else 11]) [<¥+2> <Z>]) 


Figure 7: Co- & Contravariance at Work 


Concretely, consider the left-hand program in figure 7| It has one input, G, which 
consumes a function f on numbers and returns a number. As before, the concolic 
tester starts out by generating the constant zero function. Of course, this does not 
uncover the error so, same as for first-order function inputs, the concolic tester 
turns to the input constraints in its log. However, the log simply shows that 
the user program provides G with two procedures. Therefore the case-expression 
approach does not apply in a straightforward manner. The concolic tester can 
change the input G to return a fresh input variable X as in the middle column 
of Unfortunately, this still does not help trigger the error. 

While many programming languages offer a certain notion of physical equal- 
ity for procedures, our approach is for the concolic tester to generate a function 
G that calls its argument f and then inspects the result fY with a case expres- 
sion as if it was yet another argument to G. In this case, G calls f with a fresh 
input variable Y then binds the result to fY which acts as a latent argument that 
the user program provides to G. To account for latent arguments, we generalize 
input constraints to keep track of variables such as fY together with the results 
of calls to function arguments. 

The overall effect is that the concolic tester acquires the vantage point it needs 
to follow the same process as for first-order function inputs. In particular, the 
input constraints for fY contain the results from calling f that in turn are tied to 
input variable Y and thus under the control of the concolic tester. Furthermore, 
just like for first-order functions, they provide guidance for filling in the clauses 
of the case expression of G. Concretely in our example, the input constraints for 
fY record that it is equal to either <Y+1> or <¥+2>, which the concolic tester can 
consider as distinct and, with the help of the SMT solver, generate the G on the 
right-hand side of that triggers the error, where X and Z are fresh input 
variables mapped to 4 and 5 respectively. 

Overall, the concolic tester handles function inputs by decomposing them one 
layer at a time until it ends up with first-order functions. At each point of de- 
composition, that is when an input calls one of its arguments, the concolic tester 
introduces fresh input variables and logs input constraints that connect the fresh 
input variables and the calls’ results. Then it keeps track of these connections 
with input constraints and uses the constraints to fill in the case expressions 
in the bodies of higher-order function inputs. Effectively, this approach entails 
that the concolic tester considers inputs in a so called canonical form only. In- 
formally, canonical inputs nest let-expressions and case-expressions. The precise 
definition of canonical functions and their evolution are the subject of 
along with the rest of the model for higher-order concolic testing. 
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3 Formalizing Higher-order Concolic Testing 


The core of our formal model of higher-order concolic testing is a concolic (ab- 
stract) machine that loads and runs user programs, and the input evolution 
metafunction that generates inputs for the next run. 


The User Program, e Input Environment 
ļ p: X — nor CF (i.e. 

Loading with Inputs canonical functions) 

e = L[p, e] 

L 

Concolic Evaluation Evolution 

<p, [], e) —>* <p, x, e’) Logs <p’, x» € evolvel[p, a] 

Į T 


Counter-example, p 


Figure 8: The Full Input Evolution Cycle 


[Figure 8]depicts how the concolic machine and the input evolution metafunc- 
tion work together to form the concolic loop. At the beginning of each iteration 
of the loop, the load metafunction £ consumes the environment p that maps each 
input variable X in the user program e to a value and prepares the user program 
for the concolic machine. The concolic machine evaluates the loaded program, e, 
with the help of two registers: the environment of inputs p and the log x (that 
is initially empty). If the result of the evaluation is not an error, the final con- 
tent of log x together with the environment p determine how the input evolves. 
Specifically, the evolve metafunction uses them to compute a list of pairs that 
each contains a new environment of inputs p’ and a prediction of the contents 
of the log x' of the concolic machine after evaluating the program with p’. The 
concolic loop repeats and, with each iteration, explores one more input. When it 
discovers an error in the user program, the loop terminates and the environment 
of the error-generating input turns into a concrete counter-example. 


Section 3.1/details the concolic machine, formalizes the evolution 
function and extends the model with concretization. 


3.1 From User Programs to Concolic Evaluation 


op = !|+|-|*|< |= | integer? | procedure? 
e::= n | error | x | X | Ax. e) | op e | op e e | e e | (cond [e e] ... [else e]) 
X, Y, Z, F, G, etc., are concolic variables. 


Figure 9: The Syntax of User Programs 
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CF ::= (Ax. casey) 
case, ::= (case! x) | (case! x [procedure? e°]® [<t> e°]t ...) 


€ ::= v° | (let z = f v° in case,) 
v° ::= x | <K> | CF 


Figure 10: Canonical Functions 


[Figure 9] collects the constructs of the language of user programs, including 
numbers n, error, primitive operators op e ..., multi-way conditional expressions 
cond, and uppercase variables X, Y, F, etc., for the inputs of a user program. 
These inputs are either numbers or, as we discuss briefly in fection 2.5] functions 
in canonical form. The error construct represents actual bugs in user programs; 
dynamic type errors manifest themselves as stuck terms. 

Figure 10|provides the formal definition of canonical functions. The body of 
a canonical function with argument x is a case, expression with zero or more 
clauses. As we mention in a case, that has no clauses is equivalent 
to the constant 0. Different than the presentation in [section 2] and due to the 
dynamically-typed nature of our model, the very first clause of every non-empty 
case, always checks whether x is a function. If x is a function f, similar to the 
discussion in fection 2.5] the action e° of the procedure? clause is typically a let 
expression that applies f and inspects the result of the application z with yet 
another case expression] If x is a number then the casey compares x with each of 
the concolic values <t> and delegates to the corresponding action e? Similar to the 
examples of the argument v’ for f in a let expression is an input, i.e., 
a concolic value <X> where X is a fresh concolic variable, or a canonical function. 
Some goes for the actions e° of a non-procedure? clause of a case expression. 
However, in these positions the model can also use variables in scope in an 
attempt to identify a counter-example for a user program with fewer concolic 
loop iterations, which is helpful when proving the metatheoretical properties of 
the model. In general, despite their restricted shape, canonical functions can 
simulate any function input that triggers an error in a user program. We return 
to this point in Section 4] 

As a final remark on canonical functions, one important difference from the 
discussion of function inputs in [section 2.5]is that, herein, each case expression 
comes with labels ¢. There are two kinds of labels: labels that uniquely identify 
a case expression and labels that uniquely identify a clause of a case. As we 
explain further on, their purpose is to allow the concolic tester to analyze the 
log of the concolic machine after each iteration of the concolic loop to direct the 
evolution of a canonical function. 

[Figure 11|shows the complete definition of the concolic machine. As we men- 
tion at the beginning of this section, the machine has three registers: the input 
environment p that maps concolic variables X to either numbers or canonical 
functions; the log of constraints x; and the term e the machine evaluates. 


1 We use let x = e; ine, as shorthand for (Ax. e,) e;. 
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M ::= <p, x, e) 


P: X— nor CF 


xz := [p,..] 
p ::= “R-CoND’, “FALSE”, <t>) | ““R-COND”, “TRUE”, <t>) 
| €R-CAsE”, £, v, “Miss” | <“R-CASE”, £, v, “HIT”: £) 


t:=X|n|!t|optt 
e ::= <t | error | x | (Àx. e) | op e | op e e | e e | (cond [e e] ... [else e]) 
| (caset e) | (caset e [procedure? e]! [kt> e]t ...) 


v= (Ax. e) | <b 

E:= [] 
|opE|opEe|opvE|Ee|vE| (cond [E e] [e e] ... [else e]) 
| (caset E) | (caset E [procedure? e]* [<t> e]t ...) 

L:pe>e (interesting cases) E:pt—n 

Lip, n] = < Elp. n] =n 

Lip, X] = <X> Elp, X] =n 

where p(X) =n where p(X) =n 

Lip, F] = Ax. casey Elp, opt... =n 

where p(F) = (Ax. case) where op € {!, +, -, x, <, =}, 


dLop, Elp, tJ, ...J=n 


Figure 11: The Concolic Machine and the Evaluation Language 


Evaluation terms e are user program terms extended with canonical functions 
and concolic values <t>. Recall from [section 2] that the latter keep track of the 
provenance of a value as a symbolic first-order formula t that an SMT solver 
can handle. The concrete counterpart of a concolic value can be computed at 
any point in the evaluation from t and the input environment p of the concolic 
machine with the simple € metafunction. 

The log, x, of the concolic machine collects two kinds of constraints, p. Path 
constraints are either <“R-COND”, “FALSE”, <t>} or «“R-COND”, “TRUE”, <t>) and are logged 
by evaluating cond expressions. The first indicates that the test of a branch failed 
during concolic evaluation; the second that the test succeeded. In either case, the 
concolic value of the test is <t> where the symbolic first-order formula t codifies 
the necessary and sufficient condition for the test to succeed. 

Input constraints, <“R-CasE”, £, v, “HiT: t} and <¢‘R-Casz’, £, v, “Miss”, are logged 
by evaluating case expressions in canonical functions. The label £ associates 
each input constraint with a case expression in the input environment p. A 
(“R-CasF”, £, v, “HiT: £) constraint indicates that the case expression with label £ 
given value v followed the action of its clause with label £. A <“R-Case”, £, v, “Miss” 
indicates that the case with label ¢ given value v followed the implicit in our 
model “else” clause, whose action is the constant 0. Since the first thing a canon- 
ical function does when it interacts with the user program is to inspect the value 
it receives with case, some of the values v in input constraints are exactly the 
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values that the user program provides to function inputs and consequently the 
concolic tester. Others are the results of calls to functions of the user program 
that higher-order function inputs perform with their let expressions, which are 
also values that the user program provides to the concolic tester as we discuss 
in [ection 2.5] Hence the input constraints here supersede the simplified input 
constraints from 


Since concolic evaluation handles concolic rather than concrete values, the 
Lip, e] metafunction prepares a user program e accordingly for the concolic 
machine. It traverses e and replaces every integer n with <n>, concolic variables 
X with <X if p maps X to an integer and F with the actual function if p maps 
F to a canonical function. Note that £ does not introduce any <F> since p(F) can 
be a higher-order function which, in general, SMT solvers have no theory for. 


Given a loaded program, the concolic machine operates in accordance with 
the reduction rules from The rules can be divided into four groups. 
Group SYM implements base-value provenance tracking for primitive operators. 
For primitive operators that have straightforward SMT formula counterparts, 
rule [R-TRACE1] produces a concolic value whose formula is formed by the 
operator and the symbolic provenance of the operands. Otherwise, [R-TRACE2] 
discards the provenance information of the operands and simply returns the 
concolic value <n» where n is the concrete result of the operation. 

The next group, COND, includes the rules for cond expressions. In general, 
the concolic machine inspects the concrete counterpart of the value of the test 
expression in the first clause of a cond determine whether to take or skip a branch. 
When €[p, t] is non-zero, [R-CONDTRUE] proceeds with the action expression e; 
of the first clause and logs the path constraint <“R-Conp’, “TRUE”, <t>}. When Elp, t] 
is zero, rule [R-CONDFALSE] drops the first clause of the cond and appends 
the path constraint <“R-ConD’, “FALSE”, <t>) to the list of path constraints. If cond 
has no other clauses but the else one, [R-CONDELSE] replaces the conditional 
expression with the action expression e of its else clause. 


The third group, CASE, describe the evaluation of case expressions from 
canonical functions. When evaluating a case expression, the concolic machine 
searches the clauses for a match. If the case expression is empty or if the input (v) 
is a concolic value whose concrete counterpart is a number that is different from 
tests of all clauses, [R-CASEMIsSs1] and [R-CASEMISS2] (respectively) reduce 
the case expression to the default action expression <0>. They also append the 
input constraint <“R-CasE”, £, v, “Miss”) to the log. Otherwise, the last two rules of 
the group handle successful matches. For cases where the input v is a function 
dx. e, [R-CASEHIT1] reduces case to the action expression of its first clause e. 
For cases where the input v is a concolic value <t>, rule [R-CASEHIT2] selects 
the matching clause with label £; and reduces case to the corresponding action 
e. Both rules log the input constraint <“R-Case”, £, v, “Hit’:£) with the label £ of 
the case expression, the input v and the label ¢; of the matching clause. 

The last group, OTHER, completes the definition of the reduction rules. Rule 
[R-APP] is the standard call-by-value 6-reduction while rule [R-ERROR] and 
[R-CTXT] close the rules over evaluation contexts. 
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Group SYM 
op Eį!, +t, -,x,<,= 
PEt j [R-TRACE1] tag:opv— Oorl 

(p, T, op <tp ...) — <p, T, <op ti ...>) taglinteger?, <t>] =1 
taglinteger?, (Ax.e)] = 0 
op € {integer?, procedure?} tag[procedure?, <t>] = 0 
[R-TRACE2] | 4, [procedure?, (Ax. e)] = 1 

<p, 7, op v} — <p, x, <taglop, v]>> SIP 2 i 


Elp,t] #0  m'=r + [«R-Conn”, “TRUE”, <t>)] Group COND 
[R-CONDTRUE] 


<p, z, (cond [<t> ei] [ezez] ... [else e:])) — <p, 2’, e 


E[p,t]=0 m= + [<‘R-ConpD’, “FALSE”, <t>)] 
ooo [R-CONDFALSE] 
<p, 7, (cond [<t> e;] [ez e2] ... [else ex])) — 


<p, T’, (cond [e, e}] ... [else e;])> 


[R-COnDELSE] 
<p, 7, (cond [else e])) — <p, z, e 


m= + [(R-Casé”, £, v, “Miss”)] GROUP CASE 
TT [R-CASEMISS1] 
<p, z, (case! v)) — <p, 7’, <0>) 
Elp, t] ¢{Elp, te], ...} m= + [<‘R-Case”’, £, <t>, “Miss”)] 
[R-CASEMISsS2] 


<p, 7, (case! <t> [procedure? e,]" [<t» e2]” ...)) —— <p, 2’, <0>) 


m=m + [(“R-Case”’, £, (Ax. e), ‘HIT’: £] 
SSS SR Casebiry 
<p, 7, (case! (Ax. e) [procedure? e;]" [<t» e]! ...)) — <p, 2’, er 


[<f2, to, e», | = [<p ty, ep, “| +H Kt; ti, e)] ++ [Ke t; e3, sl 
lp. tle {Elp td} Elp, t = Elp, t] 


m=m + [(R-CAsF”, £, <t>, “HIT: £)] 
—_ FF TF TOT [RC ASEH IT?) 
<p, 0, (case! <t> [procedure? e,]" [kt» e]! ...)) —+ <p, 7’, er 


[R-APP] GROUP OTHER 
<p, 7, (Ax. e) v) — <p, z, e{x — v} 


<p, T1, €) — <p, T2, €22 E+[] 
—R-CTxT] — RERROR] 
<p, T, El[e:]) —> <p, x, E[error]) —> 
<p, 2, E[e2]) <p, x, error) 


Figure 12: The Reduction Relation of Concolic Evaluation 
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Before concluding, it is worth mentioning that if the concolic evaluation of a 
user program raises error, it is straightforward for the concolic tester to produce a 
counter-example in the language of user programs. All the necessary information 
is in the latest input environment of the concolic machine. 


3.2 Evolution of Higher-order Inputs 


If the concolic machine evaluates a user program without raising an error, the 
metafunction evolve[p, x] analyzes the log of the machine and compiles a list 
of new input environments. Specifically, for each constraint from x, evolve[p, x] 
“switches” its truthfulness and computes all new input environments p’ that are 
compatible with the switched constraint. Here, a new input environment p’ is 
compatible with x if running the user program with p’ produces a log x’ that has 
the same prefix as x plus the constraint that evolve has switched to obtain p! Put 
differently, evolve returns all possible evolutions of the current input that direct 
the concolic tester to explore a new aspect of the behavior of the user program. 


from [section 4]states this property formally. 


<p’, T’) € evolvelp, x] 


— [MPRE] 
<p’, 1’) € evolvel[p, x + [p]] 
m = 7, + [<“R-COND’, “FALSE”, vY] m = 7, + [<‘R-COND’, “TRUE”, vY] 
m'= m; ++ [(‘R-COND’, “TRUE”, vY] m'= 1, + [<“R-COND’, “FALSE”, vY] 
"= updatelp, x’ ' = updatelp, x’ 
p updaten] ne tee Patel Dn Bate) 
<p’, T’) E€ evolvelp, x] <p’, 1’) E€ evolvelp, z] 


Figure 13: Negating Conditional Branches in User Programs 


[Figure 13] collects the three most basic rules of the definition of evolve. The 
first rule, [M-PREFIX], is an administrative one; it allows the removal of an 
arbitrary suffix from the log x so that the rest of the rules can focus on the last 
entry of the remaining log. 

The next two rules, [M-FALSE| and [M-TRUuE], form the first-order aspect of 
evolve that we discuss Sree They fire when the last entry of the log is 
a path constraint from a branch of a cond expression of the user program. Their 
purpose is to guide evolve to generate an input that forces concolic evaluation to 
change the outcome of the branch. To do so, the two rules replace the constraint 
with its “negation” and then, with metafunction update, they present the modified 
list of constraints as a problem to an SMT solver and use the solution to obtain 
a new input environment p: 

[Figure 14|presents the higher-order rules and figure 15]contains the auxiliary 
definitions they need. The higher-order rules switch an input constraint of form 
«“R-CAsE’, £, v, >. Recall that such constraints result from the evaluation of a 
case expression with label £ in the body of a canonical function. Thus an input 
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F € dom(p), p(F) = C’[(case’ y)] n = 2, + [¢“R-Case”, £, (Ax. e), “MIss”)] 
<p, e°% € actionp[[p, locals[C*], ty}ulocals,[C*]] 
fresh £, ¢labels(p,e°) m'=7, + [<‘R-CasE’, £, (Ax. e), HIT”: £] 
'= p,[F > C’[(case* rocedure? e°]t: 
p'= pil [( y [p 1) [M.NewProci] 
<p’, 1’) € evolvel[p, x] 
F € dom(p), p(F) = C°[(caset y)] m = 2, 4+ [(“R-Case”’, £, <t>, “MIss”)] 
<p, e°% € actionp[[p, locals[C*], ty}ulocals,[C*]] 
fresh £; ¢ labels(p,, e°) m=, + [<“‘R-Case”’, £, <t>, “Miss”)] 
'= p,[F > C'°[(case! y [procedure? e°]t: 
p'= pil [( yl 1) iNe prac 
<p’, 2’) € evolvelp, x] 
F € dom(p), p(F) = C°[(case! y [procedure? ej]! [<t» e2]! ...)] 
n = 1, + [R-CAsF”, £, <t>, _)] 
<pi, e°% € actiong[p, locals[C°], localsp]C°]] 
fresh £,., ¢ labels(p;, e°) m= 7, + [R-CASE”, £, <t>, HIT: tn] 
p2 = pi[F > C'°[(case! y [procedure? eJ” [<t» eS] ... [kt> e°]!)]] 
p' = update] p x'] 


[M-NEwINT] 
<p’, T’) E€ evolvel[p, x] 
n = 1, + [<“R-CASE”, £, <t>, _)] 
F € dom(p), p(F) = C°[(case! y [procedure? ej]! [<t» e3]” ...)] 
[Xte, £2), ...] = [Ktp p>, -..] ++ [Kt £2] 4 [Kts £), -..] 
m= nm; + [<“R-CASE’, £, <t>, “HIT: £; "= updatelp, x’ 
1 K >] p p [p. x] [M-CHANGE] 


<p’, T’) € evolvelp, x] 


Figure 14: Directed Evolution of Higher-order Inputs 


constraint is sufficient for evolve to identify the case expression in the input 
environment it concerns. 

Rules [M-NEwProcl1] and [M-NEWPROC2] apply when the case expres- 
sion with label £ is empty. They modify p to extend the case expression with 
a procedure? clause, the default first clause for recognizing function arguments. 
Rule [M-NEWPROC1| handles the situation where v, the value case examines, is 
a function. To create a new clause, [M-NEWPROC]] calls action, to compute new 
actions — we return to this metafunction towards the end of the section. Rule 
[M-NEWPROC2| handles the situation where v is a first-order concolic value <t>. 
It is the same as [M-NEWPROCI1] except that the new list of constraints still 
ends with <“R-Caser”’, £, v, “Miss”) as <t> cannot match the new procedure? clause of 
the case expression. 

If the case expression with label £ is non-empty, the concolic tester can change 
its evaluation only when v is not a function. After all, if v is a function, the 
evaluation of a non-empty case always follows the first clause of the case. As 
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C° is the compatible context of e°. 
Ac {x,y,z,...} stands for any finite subset of non-concolic variables. 
locals: C° > A 
Given a compatible context of canonical functions, computes the 
set of all variables in scope in the hole. 
localsy: C° > A 
Given a compatible context of canonical functions, compute the set 
of all variables in scope in the hole that are bound to functions. 


(p’, v% € actiondp, A] fA, actiony: p AA — [<p, e°, ...] 


freshx@A fresh label: ", Vv’) € acti ,A 
ea ee) Ce, [E-Havoc] P V E actiondp. Al iondip. Al [E-ConsT] 
(p’, let x =f v° in (case! x)) € <p’, v% € actiong[p, A, Ap] 
actionp[p, A, Ap] 
action,.: p A —> [Xp, vò, ...] 
fresh X ¢ dom(p) xEA 
—— C-t] = [C_ Bon] 
<p[X +> 0], X» € action,[p, A] <p, x> € action,[p, A] 
Xeéed , p(X) = freshx¢@A freshe ¢ label 
om(p), p(X) =n [cii] resh x ¢ resh £ ¢ labels(p) [C-PRoc] 
<p, X» € action,[p, A] (p, Ax. (caset x)) € action,[p, A] 


Figure 15: Computation of New Actions & Local Variables 


we discuss in |section 2.3] and |section 2.4] if v is a first-order concolic value <t>, 


the tester has two options: either to extend the case expression with a new 
clause, or to assert that <t> matches an existing clause. Rules [M-NEWINT] and 
[M-CHANGE] handle these two cases, respectively. There are two subcases for 
[M-NEWINT]: <t> matches an existing clause but the tester opts to create a 
dedicated clause for it in the next iteration of the concolic loop, or <t> does not 
match any existing clause and the tester extends the case to accommodate it. In 
either case, rule [M-NEWINT] computes the new actions for the additional clause 
in the same manner as in [M-NEWPROC1] and the new clause is inserted into 
the case expression. As a last step, rule [M-NEWINT] queries the SMT solver 
to adjust the values of first-order inputs in the environment, ensuring that all 
the clauses of the extended case are distinct. Rule [| M-CHANGE] corresponds to 
the discussion in [section 2.4Jand its goal is to assert that <t> matches an existing 
clause £; of the case expression. Hence evolve replaces the last entry of the log 
with “R-Case”, £, <t>, “HIT”: £). Similar to the previous rule, as a last step rule [M- 
CHANGE| consults the SMT solver to adjust the input environment given the 
new constraint about <t>. 


As a final remark, metafunction action, computes the set of actions for the 
new case clauses that evolve introduces. It largely follows the grammar of e° 
discussed in When it introduces a new function or a let-expression 
as a new action, action, constructs an empty case for their corresponding body 
expressions. Moreover, action, delegates to locals and locals, to compute the set of 
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variables that new actions can refer to. The metafunction locals takes a context 
C° and extracts the set of all local variables visible in the hole. The metafunction 
locals, is similar to locals but only extracts variables that are bound to functions. 


3.3 Adding Concretization 


.... | concretize(e) n = €[p, t] 


lo 
ii 


-aA 
e::=.... | concretize(e) <p, m, concretize(<t>)) — <p, m, <n>} 


Figure 16: Adding Concretization to Concolic Evaluation 


[Figure I6|shows the extensions for concretization. For simplicity, we identify 
concrete values with <n> and consider such terms as feasible to interoperate 
with external functions. We do not introduce any specific concrete evaluation 
rules. Instead, we augment the reduction rules of the concolic machine with 
the [R-CONCRETIZE] that reduces the new form, concretize(<t>), to its concrete 
counterpart with the help of €. Recall that the latter metafunction uses the 
current input p to compute the value of the formula t of a concolic value. 


date< = Ad1. Ad2. (or ((date-year d1)<(date-year d2)) `) 

main-bad = )dates. (let sorted-dates = (sort dates date<) in ~) 
sort/wrap = Alst. Acmp. (sort lst (Ax. Ay. concretize(cmp x y))) 

main-ok = )dates. (let sorted-dates = (sort/wrap dates dates) in `) 


Figure 17: sort With Concretization Wrapper 


The astute reader will have noticed that the concretization extension handles 
only first-order values. In the remainder of the section, by revisiting the example 
from|section 1]in|figure 17| we argue informally that in fact this is sufficient, even 
for functions. In the example, date< is a buggy comparison function and sort is 
a library function that is polymorphic in its list argument. Since sort is external 
to the concolic tester, the evaluation of its body is delegated to a concrete ma- 
chine which does not record constraints nor handles concolic values. This quickly 
becomes an issue for testing main-bad. To discover the bug, the concolic machine 
needs to log constraints from the evaluation of date< and main-bad. However, 
this implies that date< produces concolic values which flow to sort and disrupt 
the concrete evaluation of its body. 

A straightforward non-solution is to fully concretize the list of dates and miss 
recording the critical path constraints from the evaluation of date<’s body. In 
contrast, our approach enables both the seamless interoperation of the concolic 
tester with external libraries and the collection of constraints. The key insight is 
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to create wrappers that strategically concretize concolic values. By assumption, 
sort is parametric to its input list. Thus sort can consume a list of concolic 
values as long as the comparison function produces concrete results. This leads 
to the sort/wrap function that behaves like sort, except that its cmp argument 
is wrapped in a function that concretizes cmp’s return value. 

The mechanism for creating correct wrappers for higher-order constructs 
from user annotations is well-studied pol, thus we do not formalize it. 
However, we note that our proof-of-concept implementation, discussed in 
supports all the necessary features to run the example of this section 
including lists, external functions, concretization annotations and interoperabil- 
ity between a concrete and a concolic machine. 


4 Correctness of Higher-order Concolic Testing 


This section establishes three facts about our concolic tester that together entail 
its correctness. First, given an input, if concolic evaluation of a user program 
triggers an error so does the concrete evaluation of the program (soundness). 
Second, relative to the completeness of SMT solvers, the concolic tester always 
manages to produce an input in canonical form that triggers error in the user 
program, if a counter-example for the program exists (completeness). Third, for 
each iteration of the concolic loop, the concolic tester produces a new input 
that explores a specific and selected-in-advance aspect of the behavior of the 
user program (directness). Here we discuss the necessary bits for the formal 
statements of the three facts. The complete formal development with all the 
proofs are at 

Soundness guarantees that the concolic machine respects the semantics of 
user programs. Thus, the information that the concolic machine logs or its use 
of concolic values do not affect the evaluation of programs. Specifically, the 
soundness theorem states that if the concolic evaluation of user program e with 
proper input environment p reduces to error?|the concrete evaluation of e with p 
also reduces to error. Since error represents bugs in the user program, soundness 
effectively reassures that concolic evaluation does not discover spurious bugs. 

For the formal statement of the theorem, we first introduce a few technical 
devices. For closed user programs, i.e., those without input or other free variables, 
we define a standard call-by-value reduction semantics with reduction relation 
—>,. Let C[p, e] be the metafunction that constructs concrete inputs from the 
input environment p and substitutes them in e. That is, C traverses the user 
program e, dropping any concretize forms and, for each X in e, if p maps X to 
a number, C replaces X with the number. Otherwise if p maps X to a function, C 
compiles the canonical function into an equivalent concrete function and replaces 
X with the result. 


2? An environment p is proper if (i) it maps all concolic variables occurring free in 
canonical functions in p to numbers, (ii) all labels in p are unique and (iii) the 
concrete counterparts of the tests of the clauses in case expressions are numbers. In 
this section, we only consider proper environments. 
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Theorem 1 (Soundness). Let e be any user program written in the extended 
language from|section 3.3 i.e. e with concretize forms. Let p be any input en- 
vironment closing e. If <p, [], Lp. el) —>* <p. x, error) then C[p, e] —>*% error. 


Completeness captures that if the concrete evaluation of a user program 
with some input raises error, our concolic tester can find the input through the 
iterative evolution of initially default inputs. More precisely, [Theorem 2]formal- 
izes the iterative evolution process as a sequence of pairs of inputs and logs 
(pi, T,- - , Pms Hm Such that (i) the sequence starts with an input environment 
that contains numbers and default canonical functions and ends with an input 
environment that triggers error; (ii) each x; is the log produced by the concolic 
evaluation of the user program with input environment pj, and (iii) most im- 
portantly, each and every adjacent pairs in the sequence is connected by evolve: 
(Pri T'Y E evolvelp;, xi] and x’ is equivalent to a prefix of mą, In particular, con- 
clusion (iii) says that using the logs from each iteration, evolve predicts the logs 
for the next iteration. 


Theorem 2 (Completeness). For any e written in the user language in [sec] 
with concolic variables X1,...,Xn, if there exists closed values v1,...,Vn 
in the language of user programs such that none of the values contain error and 
e{X, > v,,...} —>x error then there exists a sequence of environments and logs 
(Pi, TD, ---;<Pm, Hm» Such that dom(p:) = {X1,...,Xn} and 


1. For all X € dom(p,), either p(X) =0 or p(X) = dx. (case! x). 

2. For alll <i<m, <p: D, Lipa e) —>* Pi T, ed. 

3. For alll <i < m, there exists a pair (Pui, Tim € evolvelp;, mi] such that Ti is 
equivalent to a prefix of Tis. 

4. Pm []; LL pm El) —>* «Pm, Zm, error). 


There are two points worth unpacking here. First, conclusion 1 assumes an ap- 
propriate choice between numbers and default canonical functions in the initial 
environment p;. In an implementation, either the user supplies an input specifi- 
cation or the tester employs some sophisticated search strategy over all combina- 
tions. Second, since the user program may diverge, in conclusion 2 the concolic 
machine may need to end the evaluation early. As the maximum number of steps 
needed is finite, an implementation can overcome this by setting a time limit. 

We prove [Theorem 2]in two steps. First, we show that if there is an input for 
which the concrete evaluation of a user program raises error, then there exists 
an input environment p that contains numbers and canonical functions that also 
causes the concolic machine to triggers an error. Thus this step validates the 
definition of canonical functions. 


Lemma 1 (Representation Completeness). We say that <p, x) is a proper 
counterexample for a user program e if (i) p closes e, i.e. FV (e) c dom(p), (ii) 
<p, [], Cip, el) —>* <p, x, error) and (iii) n does not contain input constraints of 
the form “R-CaseE”, £, v, “Miss”. 
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For any user program e with inputs X,,...,Xn. if there exists closed values 
Vi,---,Vn such that no value contains error and e{X;>v,,...} —>* error then 
there exists a proper counterexample of e. 


In the second step of the proof of we show that the evolution of 
inputs during the concolic loop results in an environment input that can trigger 
an error if such an input exists. As a consequence, the concolic tester only needs 
to explore inputs it generates with evolve. 


Lemma 2 (Search Completeness). For any e with inputs X1,...,Xn, ife has 
a proper counterexample then there exists a sequence of environments and logs 


satisfying (1)-(4). 


The last fact we establish for our concolic tester is necessary for the proof of 
but also has value on its own. It entails that, at each iteration of the 
concolic loop, the concolic tester aims to explore a specific aspect of the behavior 
of the user program and indeed produces new inputs that achieve this goal. We 
call this the concolic property. Formally, [Theorem 3]shows that after the concolic 
machine evaluates a user program with an input environment produced by evolve, 
the machine’s log is a prefix of the log evolve predicts. 


Theorem 3 (Concolic). For any e and p,, if 


I; <p. D., Lip. el» —* {Pi m+ [p], e. 
2. m, has no “miss” input constraints (of the form <‘R-Casz’, £, v, “Miss” ). 
3. <p», Tı + [p]) € evolvelp,, mı» [pil]. 


then <p2,[],£[[p2 e) —>*<pz,22+ [pz],e) such that nı» [p] is equivalent to x,+ [p4]. 


5 From the Model to a Proof-of-Concept Implementation 


A question about our model is whether it can serve as a guide for an effective 
higher-order concolic tester. To provide some positive evidence, we have imple- 
mented a prototype that closely follows the model. The prototype plays the role 
of a sanity check that our theoretically-correct model is not inherently imprac- 
tical; performance was not a serious concern. Notably, the prototype’s input 
generation strategy is naive. To ensure progress, the prototype sets a config- 
urable timeout for each run and avoids duplicating work with a log from trying 


each input it generates. We leave the details to 
\chop-esop- supplementary] and only summarize our experimental results here. 

We compiled a benchmark suite from three sources. The primary source is 
’s work, specifically from the jfp branch of 
These programs ultimately come from other 
papers; see The second source is CutEr fis], the Erlang concolic tester. 
We collected all of the test cases in CutEr’s test suite that use higher-order 


functions and translated them to our prototype’s language. Finally, we contribute 
three small examples as as part of this work that have proven out of reach for 
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Name Failures Source 
games s Newer otal] 2B 


hors 0/23 


Kobayashi et al. Abe 


mochi-new 0/11 

octy 0/13 

others 1/26 Reena, ob ocd and Van al E] 
softy 0/12 10 


terauchi 0/7 Tn 
cuter 0/20 
c-hop 0/3 Interesting examples we discovered 


total 4/118 


Figure 18: Benchmark Results 


both |Nguyén et al. [33}'s tool and CutEr. Overall, the benchmark programs 
use the Scheme numeric tower, booleans, lists, objects encoded as functions [1], 


strings, symbols, and higher-order functions. 

Out of 118 benchmarks, our prototype fails to discover bugs in 4 of the 
programs. These programs can be grouped based on two limitations of our pro- 
totype. First, our search strategy is naive and as a result two benchmarks time 
out after an hour. Second, our prototype does not handle Racket’s struct decla- 
ration and a few other complex syntactic features of Racket that two of Neuyén| [Nguyên] 


fet al.] (33) s benchmarks use. 


6 Related Work 


Concolic Testing. CutEr is a concolic testing tool for Erlang gl. 
Although CutEr generates functions, it does not generate inputs that contain 
calls in their bodies [Palacios and Vidal offer an instrumentation approach 
for concolic testers of functional languages but do not address the generation of 
higher-order inputs. 

extend the design of path constraints with symbolic subtype 
expressions to handle polymorphism in object-oriented languages. However, their 
input generation uses only already defined classes. 

Path explosion remains a central challenge for concolic testing techniques 
p], and it is a challenge that has lead to approaches that rely on the correct 
handling of function inputs. compute function summaries on- 
the-fly to tame the combinatorial explosion of the search space of control-flow 
paths. Similarly, performs symbolic execution compositionally 
using function summaries. FOCAL breaks programs down into small units 
to reduce the search space; it tests each units individual and constructs a system- 
level tests by using summaries. In all three cases, the summaries are first-order 
and do not include higher-order interactions between functions. 


3 Personal communication with Kostis Sagonas. 
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Higher-order Symbolic Execution. |Nguyén et al. and |Tobin-Hochstadt 
propose the idea of refining symbolic unknown values into 


canonical shapes to generate higher-order counterexamples. We adapt their re- 
finement rules into the grammar of canonical functions in Unfortu- 
nately, despite opposite claims, their rules are not complete and fail to generate 
a counter-example for our buggy call-twice from Our work prov- 
ably fixes this issue. Moreover, we introduce the notion of input constraints to 
support the directed search of the higher-order input space. 


Random Testing. QuickCheck supports random testing of higher-order 
functions by using user-provided maps from the input type to integers and 
from integers to the output type. improves upon 
QuickCheck by using a predefined datatype to represent the syntax of higher- 
order functions. LambdaTester focuses on testing and generating higher- 
order functions that mutate an object’s state in order to affect control-flow paths 


that depend on this state. |Klein et al. random-generate higher-order inputs 


that call their arguments to trigger bugs in stateful programs with opaque types. 


7 Conclusion 


This work offers a theoretical roadmap for generalizing concolic testing to pro- 
grams with higher-order inputs. The central innovation is that our concolic tester 
records salient information about the interactions between a user program and 
its (canonical) inputs. The information induces an SMT problem that describes a 
new canonical input that exercises a yet unexplored aspect of the user program. 

For this paper, we focus on the quintessential higher-order linguistic feature, 
higher-order functions. That said, much remains to be done to build this the- 
ory into a production tool by, for example, using the insights of this paper to 
support other features such as objects and mutable state. Specifically for state, 
our model can be easily and soundly extended to imperative user programs. 
However, completeness and the generation of stateful function inputs requires 
further study. Finally, another important direction is improving the implementa- 
tion, notably exploring search optimizations and strategies. Our prototype uses 
a naive strategy and this hampers its performance. Nevertheless, we view this 
paper an essential first step towards sophisticated testing strategies for modern 
programming languages. 
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Abstract. Most automated verifiers for separation logic are based on 
the symbolic-heap fragment, which disallows both the magic-wand opera- 
tor and the application of classical Boolean operators to spatial formulas. 
This is not surprising, as support for the magic wand quickly leads to un- 
decidability, especially when combined with inductive predicates for rea- 
soning about data structures. To circumvent these undecidability results, 
we propose assigning a more restrictive semantics to the separating con- 
junction. We argue that the resulting logic, strong-separation logic, can 
be used for symbolic execution and abductive reasoning just like “stan- 
dard” separation logic, while remaining decidable even in the presence 
of both the magic wand and the list-segment predicate—a combination 
of features that leads to undecidability for the standard semantics. 


1 Introduction 


Separation logic is one of the most successful formalisms for the analysis 
and verification of programs making use of dynamic resources such as heap 
memory and access permissions [7J30/10]5]1 7/249]. At the heart of the success 
of separation logic (SL) is the separating conjunction, x, which supports concise 
statements about the disjointness of resources. In this article, we will focus on 
separation logic for describing the heap in single-threaded heap-manipulating 
programs. In this setting, the formula y * ~ can be read as “the heap can be 
split into two disjoint parts, such that y holds for one part and w for the other.” 
Our article starts from the following observation: The standard semantics of 
x allows splitting a heap into two arbitrary sub-heaps. The magic-wand operator 
—, which is the adjoint of x, then allows adding arbitrary heaps. This arbitrary 
splitting and adding of heaps makes reasoning about SL formulas difficult, and 
quickly renders separation logic undecidable when inductive predicates for data 
structures are considered. For example, Demri et al. recently showed that adding 
only the singly-linked list-segment predicate to propositional separation logic 
(i.e., with x, = and classical connectives A, V,—) leads to undecidability [16]. 
Most SL specifications used in automated verification do not, however, make 
use of arbitrary heap compositions. For example, the widely used symbolic-heap 
fragments of separation logic considered, e.g., in [3/4][13]21]22], have the following 
property: a symbolic heap satisfies a separating conjunction, if and only if one 
can split the model at locations that are the values of some program variables. 
Motivated by this observation, we propose a more restrictive separating con- 
junction that allows splitting the heap only at location that are the values of some 
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COOH a = eO Cpa 


(a) A model of 1s(x, y) * 1s(y, nil) in both the standard semantics and our semantics. 
@ ii = PHO Oa 
“OO 


(b) A model of 1s(z, nil) » t in the standard semantics. 


Fig. 1: Two models and their decomposition into disjoint submodels. Dangling 
arrows represent dangling pointers. 


program variables. We call the resulting logic strong-separation logic. Strong- 
separation logic (SSL) shares many properties with standard separation-logic 
semantics; for example, the models of our logic form a separation algebra. Be- 
cause the frame rule and other standard SL inference rules continue to hold for 
SSL, SSL is suitable for deductive Hoare-style verification à la [23M0], symbolic 
execution [4], as well as abductive reasoning [IO]. At the same time, SSL 
has much better computational properties than standard SL—especially when 
formulas contain expressive features such as the magic wand, -*, or negation. 
We now give a more detailed introduction to the contributions of this article. 


The standard semantics of the separating conjunction. To be able to justify our 
changed semantics of x, we need to introduce a bit of terminology. As standard 
in separation logic, we interpret SL formulas over stack-heap pairs. A stack is 
a mapping of the program variables to memory locations. A heap is a finite 
partial function between memory locations; if a memory location l is mapped to 
location I’, we say the heap contains a pointer from l to l’. A memory location 
l is allocated if there is a pointer of the heap from l to some location l’. We call 
a location dangling if it is the target of a pointer but not allocated; a pointer is 
dangling if its target location is dangling. 

Dangling pointers arise naturally in compositional specifications, i.e., in for- 
mulas that employ the separating conjunction *: In the standard semantics of 
separation logic, a stack—heap pair (s, h) satisfies a formula yxy, if it is possible 
to split the heap h into two disjoint parts hı and hg such that (s,h1) satisfies 
y and (s, h2) satisfies . Here, disjoint means that the allocated locations of hı 
and hg are disjoint; however, the targets of the pointers of hı and hz do not have 
to be disjoint. 

We illustrate this in Fig. where we show a graphical representation of 
a stack-heap pair (s,h) that satisfies the formula 1s(x, y) * 1s(y, nil). Here, 1s 
denotes the list-segment predicate. As shown in Fig. h can be split into two 
disjoint parts hı and he such that (s, hı) is a model of 1s(x,y) and (s, h2) is a 
model of 1s(y, nil). Now, hı has a dangling pointer with target s(y) (displayed 
with an orange background), while no pointer is dangling in the heap h. 


In what sense is the standard semantics too permissive? The standard semantics 
of x allows splitting a heap into two arbitrary sub-heaps, which may result in the 
introduction of arbitrary dangling pointers into the sub-heaps. We note, however, 
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that the introduction of dangling pointers is not arbitrary when splitting the 
models of 1s(x, y) x 1s(y, nil); there is only one way of splitting the models of 
this formula, namely at the location of program variable y. The formula 1s(a, y)* 
1s(y, nil) belongs to a certain variant of the symbolic-heap fragment of separation 
logic, and all formulas of this fragment have the property that their models can 
only be split at locations that are the values of some variables. 

Standard SL semantics also allows the introduction of dangling pointers with- 
out the use of variables. Fig. [1b] shows a model of 1s(a, nil) x t—assuming the 
standard semantics. Here, the formula t (for true) stands for any arbitrary heap. 
In particular, this includes heaps with arbitrary dangling pointers into the list 
segment 1s(z, nil). This power of introducing arbitrary dangling pointers is what 
is used by Demri et al. for their undecidability proof of propositional separation 
logic with the singly-linked list-segment predicate [I6]. 


Strong-separation logic. In this article, we want to explicitly disallow the implicit 
sharing of dangling locations when composing heaps. We propose to parameterize 
the separating conjunction by the stack and exclusively allow the union of heaps 
that only share locations that are pointed to by the stack. For example, the 
model in Fig. |1b]is not a model of 1s(z, nil) * t in our semantics because of the 
dangling pointers in the sub-heap that satisfies t. Strong-separation logic (SSL) 
is the logic resulting from this restricted definition of the separating conjunction. 


Why should I care? We argue that SSL is a promising proposal for automated 
program verification: 

1) We show that the memory models of strong-separation logic form a sepa- 
ration algebra [II], which guarantees the soundness of the standard frame rule of 
SL [40]. Consequently, SSL can be potentially be used instead of standard SL in 
a wide variety of (semi-)automated analyzers and verifiers, including Hoare-style 
verification [40], symbolic execution [4], and bi-abductive shape analysis [10]. 

2) To date, most automated reasoners for separation logic have been de- 
veloped for symbolic-heap separation logic [3|4{10]21]22]26]32]27). In these frag- 
ments of separation logic, assertions about the heap can exclusively be combined 
via x; neither the magic wand — nor classical Boolean connectives are permitted. 
We show that the strong semantics agrees with the standard semantics on sym- 
bolic heaps. For this reason, symbolic-heap SL specifications remain unchanged 
when switching to strong-separation logic. 

3) We establish that the satisfiability and entailment problem for full propo- 
sitional separation logic with the singly-linked list-segment predicate is decidable 
in our semantics (in PSPACE)—in stark contrast to the aforementioned unde- 
cidability result obtained by Demri et al. assuming the standard semantics. 

4) The standard Hoare-style approach to verification requires discharging 
verification conditions (VCs), which amounts to proving for loop-free pieces of 
code that a pre-condition implies some post-condition. Discharging VCs can be 
automated by calculi that symbolically execute the pre-condition forward resp. 
the post-condition backward, and then using an entailment checker for proving 
the implication. For SL, symbolic execution calculi can be formulated using 
the magic wand resp. the septraction operator. However, these operators have 
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proven to be difficult for automated procedures: “VC-generators do not work 
especially well with separation logic, as they introduce magic-wand -* operators 
which are difficult to eliminate.” [2] p. 131] In contrast, we demonstrate that 
SSL can overcome the described difficulties. We formulate a forward symbolic 
execution calculus for a simple heap-manipulating programming language using 
SSL. In conjunction with our entailment checker, see 3), our calculus gives rise 
to a fully-automated procedure for discharging VCs of loop-free code segments. 

5) Computing solutions to the abduction problem is an integral building block 
of Facebook’s Infer analyzer [9], required for a scalable and fully-automated 
shape analysis [10]. We show how to compute explicit representations of op- 
timal, i.e., logically weakest and spatially minimal, solutions to the abduction 
problem for the separation logic considered in this paper. The result is of theo- 
retical interest, as explicit representations for optimal solutions to the abduction 
problem are hard to obtain [10019]. 


Contributions. Our main contributions are as follows: 


1. We propose and motivate strong-separation logic (SSL), a new semantics for 
separation logic. 

2. We present a PSPACE decision procedure for strong-separation logic with 
points-to assertions, the list-segment predicate 1s(x, y), and spatial and clas- 
sical operators, i.e., *, =x, A, V, La logic that is undecidable when assuming 
the standard semantics [16]. 

3. We present symbolic execution rules for SSL, which allow us to discharge 
verification conditions fully automatically. 

4. We show how to compute explicit representations of optimal solutions to the 
abduction problem for the SSL considered in (2). 


We strongly believe that these results motivate further research on SSL (e.g., 
going beyond the singly-linked list-segment predicate, implementing our decision 
procedure and integrating it into fully-automated analyzers). 


Related work. The undecidability of separation logic was established already 
in [12]. Since then, decision problems for a large number of fragments and 
variants of separation logic have been studied. Most of this work has been on 
symbolic-heap separation logic or other variants of the logic that neither sup- 
port the magic wand nor the use of negation below the * operator. While entail- 
ment in the symbolic-heap fragment with inductive definitions is undecidable in 
general [I], there are decision procedures for variants with built-in lists and/or 
trees [3]13)34]35]36], support for defining variants of linear structures [20] or tree 
structures [42]22] or graphs of bounded tree width [21J26]. The expressive heap 
logics STRAND [29] and Dryap [37] also have decidable fragments, as have some 
other separation logics that allow combining shape and data constraints. Besides 
the already mentioned work [35]36], these include [2825]. 


1 An extension of this result to a separation logic that also supports trees can be found 
in the dissertation of the first author [31] 
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Among the aforementioned works, the graph-based decision procedures of 
and [25] are most closely related to our approach. Note however, that neither 
of these works supports reasoning about magic wands or negation below the 
separating conjunction. 

In contrast to symbolic-heap SL, separation logics with the magic wand 
quickly become undecidable. Propositional separation logic with the magic wand, 
but without inductive data structures, was shown to be decidable in PSPACE 
in the early days of SL research [12]. Support for this fragment was added to 
CVC4 a few years ago [39]. Some tools have “lightweight” support for the magic 
wand involving heuristics and user annotations, in part motivated by the lack of 
decision procedures [6J41]. 

There is a significant body of work studying first-order SL with the magic 
wand and unary points-to assertions, but without a list predicate. This logic 
was first shown to be undecidable in [8]; a result that has since been refined, 
showing e.g. that while satisfiability is still in PSPACE if we allow one quan- 
tified variable [15], two variables already lead to undecidability, even without 
the separating conjunction [14]. Echenim et al. [I8] have recently addressed the 
satisfiability problem of SL with 4*V* quantifier prefix, separating conjunction, 
magic wand, and full Boolean closure, but no inductive definitions. The logic 
was shown to be undecidable in general (contradicting an earlier claim [38]), but 
decidable in PSPACE under certain restrictions. 


Outline. In Section |2| we introduce two semantics of propositional separation 
logic, the standard semantics and our new strong-separation semantics. We show 
the decidability of the satisfiability and entailment problems of SSL with lists in 
Section [3] We present symbolic execution rules for SSL in Section [4] We show 
how to compute explicit representations of optimal solutions to the abduction 
problem in Section [5] We conclude in Section [6] All missing proofs are given in 
the extended version [33] for space reasons. 


2 Strong- and Weak-Separation Logic 


2.1 Preliminaries 


We denote by |X| the cardinality of the set X. Let f be a (partial) function. Then 
dom(f) and img(f) denote the domain and image of f, respectively. We write 
|f| := |dom(f)| and f(a) = L for x g dom(f). We frequently use set notation 
to define and reason about partial functions: f := {x1 > y1,...,Uk © Yk} is the 
partial function that maps x; to y;, 1 < i < k, and is undefined on all other 
values; f~1(b) is the set of all elements a with f(a) = b; we write f U g resp. 
f Qg for the union resp. intersection of partial functions f and g, provided that 
f(a) = g(a) for all a € dom(f) N dom(g); similarly, f C g holds if dom(f) C 
dom(g). Sets and ordered sequences are denoted in boldface, e.g., x. To list the 
elements of a sequence, we write (@1,..., 2x). 

We assume a linearly-ordered infinite set of variables Var with nil € Var and 
denote by max(v) the maximal variable among a set of variables v according 


Strong-Separation Logic 669 


T := emp |z > y |1s(xz,y) |z =y|£r#y 
pr=Tl|pxyp|¢@v|pAyl|pVve|-7¢g 


Fig. 2: The syntax of separation logic with list segments. 


(s,h) =| emp  iffdom(h) = 

(s,h)|=x=y iff dom(h) =9 and s(x) = s(y) 
(s,h)Hxr#y iff dom(h) = @ and s(x) Æ s(y) 
(sh) =x y iff h= {s(x)=> s(y)} 

(s, h) = 1s(x,y) iff dom(h 


) =@ and s(x) = s(y) or there exist n > 1, f0,...,€n with 


h = {bo > b1,...,€n-1 > ln}, s(x) = bo and s(y) = ln 
(s, h) = yi A p2 iff (s, h) = yi and (s, h) = ye 
(s, h) moe iff (s, h) Ep 
(s, h) ES yi * p2 iff there exist hı, ho with h = hı + hz, (s, h1) E g1, (8, h2) E vo 
(s, h) E p1-®ọ2 iff exist hı with (s, h1) ES p1, h + hı Æ L and (s, h + hı) E y2 
(s, h) È p1 * yo iff there exists hı, h2 with h = hı W° ho, (s, h1) È y1, (s, h2) È y2 
(s, h) È p1-®y2 iff exists hı with (s, h1) È 1, h W° hi # L and (s, h W° hi) È yo 


Fig. 3: The standard, “weak” semantics of separation logic, and the “strong” 


k t . . . wk st 
semantics, |=. We write = when there is no difference between and È. 


to this order. In Fig. |2| we define the syntax of the separation-logic fragment 
we study in this article. The atomic formulas of our logic are the empty-heap 
predicate emp, points-to assertions x +> y, the list-segment predicate 1s(x, y), 
equalities x = y and disequalities x # P]; in all these cases, x,y € Var. For- 
mulas are closed under the classical Boolean operators ^A, V,— as well as under 
the separating conjunction * and the existential magic wand, also called the 
septraction, -® (see e.g. [8]). We collect the set of all SL formulas in SL. We 
also consider derived operators and formulas, in particular the separating im- 
plication (or magic wand), -*, defined by pọ% := a(y-@-w) F] We also use 
true, defined as t := emp V memp. Finally, for ® = {¢91,..., Pn}, we define 
x P := Y1 * YQ *-++ * Yn ifn > 1 and *@:= emp ifn = 0. By fvs(y) we denote 
the set of (free) variables of y. We define the size of the formula ọ as |p| = 1 
for atomic formulas ọ, |p1 xX Y2| := |yi| + |~2| + 1 for x € {A,V,*,-@} and 
yil = |y1| + 1. 


2.2 Two Semantics of Separation Logic 


Memory model. Loc is an infinite set of heap locations. A stack is a partial 
function s: Var — Loc. A heap is a partial function h: Loc — Loc. A model is 
a stack-heap pair (s, h) with nil € dom(s) and s(nil) € dom(h). We let locs(h) := 


? As our logic contains negation, x Æ y can be expressed as =(x = y). However, we 
treat disequalities as atomic to be able to use them in the positive fragment of our 
logic, defined later, which precludes the use of negation. 

3 As -* can be defined via -® and - and vice-versa, the expressivity of our logic does 
not depend on which operator we choose. We have chosen -® because we can include 
this operator in the positive fragment considered later on. 
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dom(h) Uimg(h). A location £ is dangling if £ € img(h) \ dom(h). We write S 
for the set of all stacks and H for the set of all heaps. 


Two notions of disjoint union of heaps. We write hı +h for the union of disjoint 


heaps, i.e., . 
hy the := m U hə, if dom(h1) N dom(h2) =o 
IL; otherwise. 


This standard notion of disjoint union is commonly used to assign semantics 
to the separating conjunction and magic wand. It requires that hı and ho are 
domain-disjoint, but does not impose any restrictions on the images of the heaps. 
In particular, the dangling pointers of hı may alias arbitrarily with the domain 
and image of hə and vice-versa. 

Let s be a stack. We write hı W* ha for the disjoint union of hı and hz that 
restricts aliasing of dangling pointers to the locations in stack s. This yields an 
infinite family of union operators: one for each stack. Formally, 


ha W5 ho := fh +he, if locs(h1) A locs(h2) C img(s) 
dll otherwise. 


Intuitively, hı W* ho is the (disjoint) union of heaps that share only locations 
that are in the image of the stack s. Note that if hı Wê ha is defined then hı + h2 
is defined, but not vice-versa. 

Just like the standard disjoint union +, the operator W° gives rise to a sepa- 
ration algebra, i.e., a cancellative, commutative partial monoid [I]: 


Lemma 1. Lets be a stack and let u be the empty heap (i.e., dom(u) = 0). The 
triple (H, w°, u) is a separation algebra. 


Weak- and strong-separation logic. Both + and &* can be used to give a seman- 
tics to the separating conjunction and septraction. We denote the corresponding 
model relations = and Č and define them in Fig. |3| Where the two semantics 
agree, we simply write =. 

In both semantics, emp only holds for the empty heap, and x = y holds for 
the empty heap when z and y are interpreted by the same location} Points-to 
assertions x > y are precise, i.e., only hold in singleton heaps. (It is, of course, 
possible to express intuitionistic points-to assertions by x > y * t.) The list 
segment predicate 1s(x, y) holds in possibly-empty lists of pointers from s(x) to 
s(y). The semantics of Boolean connectives are standard. The semantics of the 
separating conjunction, *, and septraction, -®, differ based on the choice of + 
vs. W® for combining disjoint heaps. In the former case, denoted $*, we get the 


standard semantics of separation logic (cf. [40]). In the latter case, denoted Ë, 
we get a semantics that imposes stronger requirements on sub-heap composition: 
Sub-heaps may only overlap at locations that are stored in the stack. 


4 Usually x = y is defined to hold for all heaps, not just the empty heap, when x 
and y are interpreted by the same location; however, this choice does not change the 
expressivity of the logic: the formula (x = y) * t expresses the standard semantics. 
Our choice is needed for the results on the positive fragment considered in Section [2.3] 
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Fig. 4: Two models of (1s(a, nil) * t) A (1s(8, nil) « t) for a stack with domain a, b 
and a stack with domain a, b, c. 


. E: oa . . 
Because the semantics imposes stronger constraints, we will refer to the 
. wk . . . 
standard semantics F as the weak semantics of separation logic and to the se- 
P 


. t . . . 
mantics | as the strong semantics of separation logic. Moreover, we use the 


terms weak-separation logic (WSL) and strong-separation logic (SSL) to distin- 
wk 


st 


and 


guish between SL with the semantics 


Example 1. Let y := a Æ b » (1s(a, nil) * t) A (18(0, nil) » t). In Fig. |4| we show 
two models of y. On the left, we assume that a,b are the only program variables, 
whereas on the right, we assume that there is a third program variable c. 

Note that the latter model, where the two lists overlap, is possible in SSL 
only because the lists come together at the location labeled by c. If we removed 
variable c from the stack, the model would no longer satisfy y according to the 
strong semantics, because W° would no longer allow splitting the heap at that 
location. Conversely, the model would still satisfy y with standard semantics. 

This is a feature rather than a bug of SSL: By demanding that the user of 
SSL specify aliasing explicitly—for example by using the specification 1s(a, c) * 
1s(b, c) x 1s(c, nil) A c # nil—we rule out unintended aliasing effects. A 


Satisfiability and Semantic Consequence. We define the notions of satisfiability 
and semantic consequence parameterized by a finite set of variables x C Var. 
For a formula y with fvs(~) C x, we say that y is satisfiable w.r.t. x if there is 
a model (s,h) with dom(s) = x such that (s,h) È y. We say that y entails Y 
wrt. x, in signs y F, Y, if (s,h) F y then also (s, h) È y for all models (s, h) 
with dom(s) = x. 


2.3 Correspondence of Strong and Weak Semantics on Positive 
Formulas 


We call an SL formula y positive if it does not contain —. Note that, in particular, 
this implies that y does not contain the magic wand — or the atom t. 
In models of positive formulas, all dangling locations are labeled by variables: 


Lemma 2. Lety be positive and (s, h)  y. Then, (img(h)\dom(h)) C img(s). 


As every location shared by heaps hı hz in hı + hg is dangling either in hı or 
in hz (or both), the operations + and * coincide on models of positive formulas: 


wk wk 


Lemma 3. Let (s,hi) EF yi and (s,h2) F p2 for positive formulas ¢1, pe. 
Then hy + he AL iff hy we ho A tls 
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Since the semantics coincide on atomic formulas by definition and on * by 
Lemma 2| we can easily show that they coincide on all positive formulas: 


k 


Lemma 4. Lety be a positive formula and let (s, h) be a model. Then (s, h) F p 
iff (s,h) Fy. 


By negating Lemmal4 we have that {(s,h) | (s, h) Ey} 4 {(s,) | (s,h) È o} 
implies that y contains negation, either explicitly or in the form of a magic 
wand or t. In particular, Lemma [4] implies that the two semantics coincide on 
the popular symbolic-heap fragment of separation logicf] 

We remark that formula y in Example[I]only employs t but not 7, =. Hence, 
even if only t would be added to the positive fragment, Lemma |4| would no 
longer hold. Likewise, Lemma [4] does not hold under intuitionistic semantics: as 
the intuitionistic semantics of a predicate p is equivalent to p * t under classic 
semantics, if is sufficient to consider y := a Æ bx (1s(a, nil) A (1s(b, nil)). 


3 Deciding the SSL Satisfiability Problem 


The goal of this section is to develop a decision procedure for SSL: 


Theorem 1. Let p E€ SL and let x C Var be a finite set of variables with 
fvs(y) C x. It is decidable in PSPACE (in |p| and |x|) whether there exists a 


model (s,h) with dom(s) = x and (s,h) È ọ. 


Our approach is based on abstracting stack—heap models by abstract memory 
states (AMS), which have two key properties, which together imply Theorem [1] 


Refinement (Theorem [2). If (s1,h1) and (s2, h2) abstract to the same AMS, 
then they satisfy the same formulas. That is, the AMS abstraction refines 
the satisfaction relation of SSL. 

Computability (Theorem |5| Lemmas and [22). For each formula vy, we 
can compute (in PSPACE) the set of all AMSs of all models of y; then, ọ is 
satisfiable if this set is nonempty. 


The AMS abstraction is motivated by the following insights. 


1. The operator W* induces a unique decomposition of the heap into at most 
|s| minimal chunks of memory that cannot be further decomposed. 

2. To decide whether (s,h) F y holds, it is sufficient to know for each chunk 
of (s,h) a) which atomic formulas the chunk satisfies and b) which variables 
(if any) are allocated in the chunk. 


5 Strictly speaking, Lemma [f] implies this only for the symbolic-heap fragment of the 
separation logic studied in this paper, i.e., with the list predicate but no other data 
structures. The result can, however, be generalized to symbolic heaps with trees (see 
the dissertation of the first author [31]). Symbolic heaps of bounded treewidth as 
proposed in [21] are an interesting direction for future work. 
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We proceed as follows. In Sec we make precise the notion of memory 
chunks. In Sec. we define abstract memory states (AMS), an abstraction 
of models that retains for every chunk precisely the information from point 
(2) above. We will prove the refinement theorem in We will show in Sec- 
tions [3.4}[3.6] that we can compute the AMS of the models of a given formula y, 
which allows us to decide satisfiability and entailment problems for SSL. Finally, 
we prove the PSPACE-completeness result in Sec. 


3.1 Memory Chunks 


We will abstract a model (s,h) by abstracting every chunk of h, which is a 
minimal nonempty sub-heap of (s, h) that can be split off of h according to the 
strong-separation semantics. 


Definition 1 (Sub-heap). Let (s,h) be a model. We say that hı is a sub-heap 
of h, in signs hy E h, if there is some heap hg such that h = hı W* hg. We collect 
all sub-heaps in the set subHeaps(s, h). A 


The following proposition is an immediate consequence of the above definition: 


Proposition 1. Let (s,h) be a model. Then, (subHeaps(s, h), E, U, N, =) is a 
Boolean algebra with greatest element h and smallest element Ø, where 


= (s, hy) (s, h2) = (s, hy U h2), 

— (s, hi) N (s, h2) := (s, h1 N he), and 

— ~(s, hı) := (s, h1), where hi, € subHeaps(s, h) is the unique sub-heap with 
h= hy ° hi. 


The fact that the sub-models form a Boolean algebra allows us to make the 
following definitiorf] 


Definition 2 (Chunk). Let (s,h) be a model. A chunk of (s, h) is an atom of 
the Boolean algebra (subHeaps(s, h), E, U, m, =). We collect all chunks of (s, h) 
in the set chunks(s, h). A 


Because every element of a Boolean algebra can be uniquely decomposed into 
atoms, we obtain that every heap can be fully decomposed into its chunks: 


Proposition 2. Let (s,h) be a model and let chunks(s, h) = {h1,..., hn} be its 
chunks. Then, h = hi W° ha W° - - -8 Ay. 


Example 2. Let s = {x > Ly 3,u > 5,2 > 3,w 6 7,v > 9} and h = 
{1 > 2,2 => 3,3 > 8,4 > 6,5 > 6,6 > 3,7 > 6,9 > 9,10 4 11,11 > 10}. 
The model (s, h) is illustrated in Fig. |5| This time, we include the identities of 


6 It is an interesting question for future work to relate the chunks considered in this 
paper to the atomic building blocks used in SL symbolic executions engines. Likewise, 
it would be interesting to build a symbolic execution engine based on the chunks 
resp. on the AMS abstraction proposed in this paper. 
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{{w} (hd. {iy 2b 


Fig. 5: Graphical representation of a model consisting of five chunks (left, see 
Ex.|2) and its induced AMS (right, see Ex. [5). 


the locations in the graphical representation; e.g., 3: y, z represents location 3, 
s(y) = 3, s(z) = 3. The model consists of five chunks, hy := {1 > 2,2 > 3}, 
hg := {9 > 9}, hg := {4 6 6,5 > 6,6 => 3,7 & 6}, h4 := {3 > 8}, and 
hs := {10 > 11,11 > 10}. A 


We distinguish two types of chunks: those that satisfy SSL atoms and those 
that don’t. 


Definition 3 (Positive and Negative chunk). Let he C h be a chunk of 
(s,h). he is a positive chunk if there exists an atomic formula T such that 


(s, he) Èy, Otherwise, he is a negative chunk. We collect the respective chunks 
in chunks” (s, h) and chunks” (s, h). 


Example 3. Recall the chunks hı through hs from Ex.|2| hı and ha are positive 
chunks (blue in Fig. B}, h3 to hs are negative chunks (orange). A 


Negative chunks fall into three (not mutually-exclusive) categories: 


Garbage. Chunks with locations that are inaccessible via stack variables. 

Unlabeled dangling pointers. Chunks with an unlabeled sink, i.e., a dan- 
gling location that is not in img(s) and thus cannot be “made non-dangling” 
via composition using Wê. 

Overlaid list segments. Overlaid list segments that cannot be separated via 
W5 because they are joined at locations that are not in img(s). 


Example 4 (Negative chunks). The chunk hg from Example [2| contains garbage, 
namely the location 4 that cannot be reached via stack variables, and two over- 
laid list segments (from 5 to 3 and 7 to 3). The chunk h4 has an unlabeled 
dangling pointer. The chunk hs contains only garbage. 


3.2 Abstract Memory States 


In abstract memory states (AMSs), we retain for every chunk enough information 
to (1) determine which atomic formulas the chunk satisfies, and (2) keep track 
of which variables are allocated within each chunk. 


Definition 4. A quadruple A = (V, E, p, Y) is an abstract memory state, if 
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1. V is a partition of some finite set of variables, i.e., V = {v1,..., Vn} for 
some non-empty disjoint finite sets v; C Var, 

2. E: V — V x{=1,>2} is a partial function such that there is no v € dom(E) 
with nil € f] 

3. p consists of disjoint subsets of V such that every R € p is disjoint from 
dom(E) and there is no v E€ R with nil € v, 

4. y is a natural number, i.e., y E€ N. 


We call V the nodes, E the edges, p the negative-allocation constraint and y 
the garbage-chunk count of A. We call the AMS A = (V, E, p, y) garbage-free 
ifp=0 andy=9. 

We collect the set of all AMSs in AMS. The size of A is given by |A| := 
|V| +y. Finally, the allocated variables of an AMS are given by alloc(A) := 
dom(£) UU p. A 


Every model induces an AMS, defined in terms of the following auxiliary 
definitions. The equivalence class of variable x with regard to stack s is [x]© : 
{y | s(y) = s(x)}; the set of all equivalence classes of a given stack s is cls.(s) := 
{[x] | £ € dom(s)}. We now define the edges induced by a model (s,h). For 
every equivalence class [x] € cls_(s), we set 


(fyj=,=1) there are y € dom(s) and he € chunks*(s, h) 

with (s, he) È z => y 

edges(s, h)([x]2) := < ([y]=,>2) there are y € dom(s) and he € chunks" (s, h) 
with (s, he) È 1s(z, y) A ~z > y 


otherwise. 


L 


? 


Finally, we denote the sets of variables allocated in negative chunks by 
alloc’ (s, h) := {{[z]© | s(x) € dom(he)} | he € chunks” (s, h)} \ {0}, 


where (equivalence classes of) variables that are allocated in the same negative 
chunk are grouped together in a set. 
Now we are ready to define the induced AMS of a model. 


Definition 5. Let (s,h) be a model. Let V := cls-(s), E := edges(s,h), p := 
alloc (s, h) and y := |chunks™ (s, h)| — alloc” (s, h)|. 
Then, we say that ams(s, h) := (V, E, p, y) is the induced AMS of (s, h). A 


Example 5. The induced AMS of the model (s, h) from Ex. [2] is illustrated on 
the right-hand side of Fig. |5| The blue box depicts the graph (V, E) induced 
by the positive chunks h,,h2; the negative chunks that allocate variables are 
abstracted to the set p = {{{w}, {u}}, {{y, z}}} (note that the variables w and 
u are allocated in the chunk hg and the aliasing variables y, z are allocated in 
ha); and the garbage-chunk count is 1, because hs is the only negative chunk 
that does not allocate stack variables. A 


T The edges of an AMS represent either a single pointer (case “=1”) or a list segment 
of at least length (case “> 2”). 
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Observe that the induced AMS is indeed an AMS: 
Proposition 3. Let (s,h) be a model. Then ams(s,h) € AMS. 


The reverse also holds: Every AMS is the induced AMS of at least one model; 
in fact, even of a model of linear size. 


Lemma 5 (Realizability of AMS). Let A = (V,E,p,y) be an AMS. There 
exists a model (s,h) with ams(s,h) = A whose size is linear in the size of A. 


The following lemma demonstrates that we only need the p and y components 
in order to be able to deal with negation and/or the magic wand: 


Lemma 6 (Models of Positive Formulas have Garbage-free Abstrac- 
tions). Let (s,h) be a model. If (s,h) = ọ for a positive formula vy, then 
ams(s,h) is garbage-free. 


We abstract SL formulas by the set of AMS of their models: 


Definition 6. Let s be a stack. The SL abstraction w.r.t. s, as: SL— 24™S, 
is given by 


as(p) := {ams(s,h) | h € H, and (s,h) È gy}. A 


Because AMSs do not retain any information about heap locations, just about 
aliasing, abstractions do not differ for stacks with the same equivalence classes: 


Lemma 7. Let s,s’ be stacks with clsa(s) = clsa(s’). Then as(y) = as (p) for 
all formulas ọ. 

3.3 The Refinement Theorem for SSL 

The main goal of this section is to show the following refinement theorem: 


Theorem 2 (Refinement Theorem). Let p be a formula and let (s,h1), 
(s, h2) be models with ams(s, h1) = ams(s, h2). Then (s,hi) È y iff (s, h2) Ey. 


We will prove this theorem step by step, characterizing the AMS abstraction 
of all atomic formulas and of the composed models before proving the refinement 
theorem. In the remainder of this section, we fix some model (s, h). 


Abstract Memory States of Atomic Formulas The empty-heap predicate emp is 
only satisfied by the empty heap, i.e., by a heap that consists of zero chunks: 


Lemma 8. (s,h) | emp iff ams(s, h) = (cls=(s), 0,0, 0) 


( 
Lemma 9. 1. (s,h) Ex = y iff ams(s,h) = (cls_(s),0,0,0) and [a= = [y]2. 
2. (sh) = xz Æ y iff ams(s,h) = (cls-(s),0,0,0) and [x] # [y]%. 


Models of points-to assertions consist of a single positive chunk of size 1: 
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Lemma 10. Let E = {|xr] + ([yJZ,=1)}. (s,h) H| z = y iff ams(s,h) = 
(cls_(s), E,@,0). 


Intuitively, the list segment 1s(a, y) is satisfied by models (s, h) that consist 
of zero or more positive chunks, corresponding to a (possibly empty) list from 
some equivalence class [x]= to [y]®= via (zero or more) intermediate equivalence 
classes [x1]”.,..., [£n]. We will use this intuition to define abstract lists; this 
notion allows us to characterize the AMSs arising from abstracting lists. 
Definition 7. Let A = (V, E, p, y) E AMS, s be a stack and x,y € Var. We 
say A is an abstract list w.r.t. x and y, in signs A E€ AbstLists(z, y), iff 


1. V =cls_(s), 

2. p= andy =0, and 

3. we can pick nodes vi,...,Vn E€ V and labels 11,...,tn-1 E {=1,>2} such 
that x E€ v1, y E€ Vn and E = {vib (Vipit | 1 <i <n}. A 


Lemma 11. (s,h) | 1s(x,y) iff ams(s,h) € AbstLists(z, y). 


Abstract Memory States of Models composed by the Union Operator Our next 
goal is to lift the union operator W° to the abstract domain AMS. We will define 
an operator e with the following property: 


if hı W ha AL then ams(s, hı W* hz) = ams(s, h1) è (s, h2). 


AMS composition is a partial operation defined only on compatible AMS. 
Compatibility enforces (1) that the AMSs were obtained for equivalent stacks 
(i.e., for stacks s,s’ with cls=(s) = cls=(s')), and (2) that there is no double 
allocation. 


Definition 8 (Compatibility of AMSs). AMSs Aı = (Vi, E1, p1, %1) and 
Ao = (Vo, E2, p2, %2) are compatible iff (1) Vi = Vz and (2) alloc(Aı) N 
alloc( A2) = 0. 


Note that if hı W° hg is defined, then ams(s, hı) and ams(s, h2) are compatible. 
The converse is not true, because ams(s, hı) and ams(s, h2) may be compatible 
even if dom(h1) N dom(h2) 4 4. 

AMS composition is defined in a point-wise manner on compatible AMSs 
and undefined otherwise. 


Definition 9 (AMS composition). Let A; = (Vi, Ei, pi, yi) for i = 1,2 be 
two AMS. The composition of Ai, Ag is then given by 


(Vi, E1 U E2, p1 U p2, %1 +92), if A1, A2 compatible 
Aı e Az = 
; otherwise. 


Lemma 12. Let s be a stack and let hı, ho be heaps. If hy W* ho Æ L then 
ams(s, hı) è ams(s, h2) Æ L. 


678 J. Pagel and F. Zuleger 


We next show that ams(s, hı W° he) = ams(s, hı) è ams(s, h2) whenever hı W 
hə is defined: 


Lemma 13 (Homomorphism of composition). Let (s, h1), (s, h2) be models 
with hy 8° ha Æ L. Then, ams(s, hy W* hz) = ams(s, hi) è ams(s, h2). 


To show the refinement theorem, we need one additional property of AMS 
composition. If an AMS A of a model (s, h) can be decomposed into two smaller 
AMS A = A; è Ap, it is also possible to decompose the heap h into smaller heaps 
hy, ho with ams(s, hi) = Aj: 


Lemma 14 (Decomposability of AMS). Let ams(s,h) = A; è Ag. There 
exist hy, hg with h = hı ®° he, ams(s, h1) = A, and ams(s, h2) = Ag. 

These results suffice to prove the Refinement Theorem stated at the beginning 
of this section (see the extended version [83] for a proof). 


Corollary 1. Let (s,h) be a model and ọ be a formula. (s, h) È y iffams(s,h) € 
as (¢). 


3.4 Recursive Equations for Abstract Memory States 


In this section, we derive recursive equations that reduce the set of AMS a.(y) for 
arbitrary compound formulas to the set of AMS of the constituent formulas of vy. 
In the next sections, we will show that we can actually evaluate these equations, 
thus obtaining an algorithm for computing the abstraction of arbitrary formulas. 


Lemma 15. as(y1 A p2) = as(Y1) N aslp2). 
Lemma 16. a;(y1 V p2) = as(Y1) Uas (ye). 
Lemma 17. as(71) = {ams(s, h) | h € H} \ as(y1). 


The Separating Conjunction In Section [3.3] we defined the composition opera- 
tion, è, on pairs of AMS. We now lift this operation to sets of AMS Aj, Ag: 


A, © As := {A © As | Ar E Ay, A2 E An, Are Ao Æ L}. 


Lemma [I3] implies that a, is a homomorphism from formulas and * to sets 
of AMS and e: 


Lemma 18. For all p1, p2, as(Y1 * P2) = as (p1) © as(p2). 


The septraction operator. We next define an abstract septraction operator —e 
that relates to e in the same way that -® relates to x. For two sets of AMS 
A1, Ao we set: 


Ai— As := {Ae AMS | there exists A; € A; s.t. AeA; E€ Ao} 
Then, a, is a homomorphism from formulas and -® to sets of AMS and -e: 


Lemma 19. For all p1, p2, 4s(Y1-®¥%2) = as(Y1)-0as (p2). 
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3.5 Refining the Refinement Theorem: Bounding Garbage 


Even though we have now characterized the set a;(y) for every formula y, we 
do not yet have a way to implement AMS computation: While a,(y) is finite 
if y is a spatial atom, the set is infinite in general; see the cases a,(—y) and 
as(p1-®yp2). However, we note that for a fixed stack s only the garbage-chunk 
count y of an AMS (V, E, p,7) € as(y) can be of arbitrary size, while the size 
of the nodes V, the edges E and the negative-allocation constraint p is bounded 
by |s|. Fortunately, to decide the satisfiability of any fixed formula y, it is not 
necessary to keep track of arbitrarily large garbage-chunk counts. 

We introduce the chunk size [p] of a formula y, which provides an upper 
bound on the number of chunks that may be necessary to satisfy and/or falsify 
the formula; [y] is defined as follows: 


— [emp] = [z > y] = [1s(z,y)| = [z = y| = [z Ay] i= 1 
— [yy] := [y] + [4] 


— [y-@y] := [y] 
— [pny] = [y V 4] := max([¢], [Y]) 
— [>] := fol. 


Observe that [y] < |y| for all y. Intuitively, [p] — 1 is an upper bound on the 
number of times the operation Wê needs to be applied when checking whether 
(s, h) È ọ. For example, let Y := x > y * ((b > c)-@(1s(a,c)). Then [y] = 2, 
and to verify that ~ holds in a model that consists of a pointer from x to y and 
a list segment from a to b, it suffices to split this model [p] — 1 = 1 many times 
using W* (into the pointer and the list segment). 

We generalize the refinement theorem, Theorem |2| to models whose AMS 
differ in their garbage-chunk count, provided both garbage-chunk counts exceed 
the chunk size of the formula: 


Theorem 3 (Refined Refinement Theorem). Let y be a formula with [y] = 
k. Letm > k, n > k and let (s, hı) and (s, h2) be models such that ams(s, h1) = 


(V, E, p, m), ams(s, h2) = (V, E, p, n). Then, (s, hı) = p iff (8, hə) = p. 


This implies that y is satisfiable over stack s iff y is satisfiable by a heap 
that contains at most [y] garbage chunks: 


Corollary 2. Lety be an formula with |y] = k. Then ọ is satisfiable over stack 
s iff there exists a heap h such that (1) ams(s,h) = (V, E, p, y) for some y < k 


and (2) (s, h) Ey. 


3.6 Deciding SSL by AMS Computation 


Due to Cor. [B] we can decide the SSL satisfiability problem by means of a function 
abst, (p) that computes the (finite) intersection of the (possibly infinite) set as (4) 
and the (finite) set AMS;,., := {(V,E,p,y) E€ AMS | V = cls-(s) and y < k} 
for k = [y]. We define abst, (vy) in Fig. [6] For atomic predicates we only need to 
consider garbage-chunk-count 0, whereas the cases *, -®, A and V require lifting 
the bound on the garbage-chunk count from m to n > m. 
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absts (emp) := {(cls=(s),@,0,0)} 


abst;(x = y) := if s(x) = s(y) then {(cls—(s),0,0,0)} else Ø 
abst: (x Æ y) := if s(x) F s(y) t then {(cls=(s),0,0,0)} else Ø 
absts(x++ y) := {(cls=(s), {[z]2 > [yJ2},0,0) 


abst,(1s(z,y)) := AbstLists(z, y) N AMSo,; 
abst: (p1 * p2) := AMS fy; 4y5),59 
(lift(o1 1A fei sp] (absts(p1)) © liftro21>Tp1+921 (absts(y2))) 
abst (y1-® 2) := AMS jy, -@go),s N (absts(91)-elift[ 917/411 +I 21 (absts (y2))) 
absts(y1 A p2) := lifto] 2 fep aye] (absts (1) N liftp.57 2 [1 aye] (absts (p2)) 
absts(y1 V p2) := lifto] 7 [yey veo] (absts (1) U lift.57 7 f1 veo] (absts (p2)) 

abst: (791) := AMS_.,,1,5 \ absts (%1) 


Fig. 6: Computing the abstract memory states of the models of y with stack s. 


Definition 10. Let m,n € N with m < n and let A = (V, E, p, y) E AMS. The 
bound-lifting of A from m to n is 


{A} ify<m 


liftm >n (A) := 7 k)|m<k<n} ify=m. 


We generalize bound-lifting to sets of AMS: liftm an (A) := U gea liftm n(A). A 
As a consequence of Theorem [3] bound-lifting is sound for all n > [vy], i 
liftro] zn (as (p) N AMS fei) = aslo) N AMS.. 


By combining this observation with the lemmas characterizing œs, that is Lem- 


mas [8]9]10] [18] and [19] we obtain the correctness of abst. (): 


Theorem 4. Let s be a stack and y be a formula. Then, abst,(y) = as(y) N 
AMS (41,5 


Computability of abst,(y). We note that the operators e,-0,M,U and \ are all 
computable as the sets that occur in the definition of abst,(y) are all finite. It 
remains to argue that we can compute the set of AMS for all atomic formulas. 
This is trivial for emp, (dis-)equalities, and points-to assertions. For the list- 
segment predicate, we note that the set abst,(1s(z,y)) = AbstLists(z,y) N 
AMS)joj,; can be easily computed as there are only finitely many abstract lists 
w.r.t. the set of nodes V = cls=(s). We obtain the following results: 


Corollary 3. Let s be a (finite) stack. Then absts(y) is computable for all for- 
mulas yp. 


Theorem 5. Let p € SL and let x C Var be a finite set of variables with 
fvs(~) C x. It is decidable whether there exists a model (s,h) with dom(s) = x 
and (s,h) Fy. 


Corollary 4. p = w is decidable for all finite sets of variables x C Var and 
y,w E SL. 
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qbf_to_sl(F’) := emp ^ N pairwise different QBF variables x,y T # y ^ aux(F) 


aux(x) := (x > nil) xt aux(72) := ~aux(zx) 
aux(F ^ G) := aux(F’) A aux(G) aux(F V G) := aux(F’) V aux(G) 
aux(3x. F) := (x > nil V emp)-®aux(F) aux(Yz. F) := (x > nil V emp)-*aux(F’) 


Fig. 7: Translation qbf-to-sl(F) from closed QBF formula F (in negation normal 
form) to a formula that is satisfiable iff F is true. 


3.7 Complexity of the SSL Satisfiability Problem 


It is easy to see that the algorithm abst, (p) runs in exponential time. We con- 
clude this section with a proof that SSL satisfiability and entailment are actually 
PSPACE-complete. 


PSPACE-hardness. An easy reduction from quantified Boolean formulas (QBF) 
shows that the SSL satisfiability problem is PSPACE-hard. The reduction is 
presented in Fig. |7| We encode positive literals x by (a +> nil) xt (the heap 
contains the pointer «++ nil) and negative literals by =((a +> nil) x t) (the heap 
does not contain the pointer x ++ nil). The magic wand is used to simulate 
universals (i.e., to enforce that we consider both the case x +> nil and the case 
emp, setting x both to true and to false). Analogously, septraction is used to 
simulate existentials. Similar reductions can be found (for standard SL) in [12]. 


Lemma 20. The SSL satisfiability problem is PSPACE-hard (even without the 
1s predicate). 


Note that this reduction simultaneously proves the PSPACE-hardness of SSL 
model checking: If F is a QBF formula over variables 71,..., 7%, then qbf_to_sl(F’) 
is satisfiable iff ({z; > l; |1<i<n},0) © qbfto_sl(F) for some locations 4; 


PSPACE-membership. For every stack s and every bound on the garbage-chunk 
count of the AMS we consider, it is possible to encode every AMS by a string of 
polynomial length. 


Lemma 21. Letk €N, lets be a stack and n := k+|s|. There exists an injective 
function encode: AMS;, > {0,1}* such that 


|encode(A)| € O(nlog(n)) for all AG AMS, 5. 


An enumeration-based implementation of the algorithm in Fig. (6 (that has 
to keep in memory at most one AMS per subformula at any point in the com- 
putation) therefore runs in PSPACE: 


Lemma 22. Let p € SL and let x C Var be a finite set of variables with 
fvs(y) C x. It is decidable in PSPACE (in |p| and |x|) whether there exists a 
model (s,h) with dom(s) = x and (s,h) È ọ. 


The PSPACE-completeness result, Theorem |1| follows by combining Lem- 
mas [20] and 22] 
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{x > z}a.next := y {x > y} {emp} malloc(x) {x => m} 


{x > z} free(x) {emp} {emp} z := y {x = y} 


x different from y 


{y > z} x := y.next {y zx x£ = z} 


pisr=yorxrt#y 
{emp} assume(y) {p} 


Fig. 8: Local proof rules of program statements for forward symbolic execution. 


Frame rule as pares 0} x = modifiedVars(c), x’ fresh 
ere {P} c{Q} 2 : 
Materialization [Pcie o z+ (o owe) Q HE 7((a > nil)-@t), z fresh 


Fig. 9: The frame and the materialization rule for forward symbolic execution. 


4 Program Verification with Strong-Separation Logic 


Our main practical motivation behind SSL is to obtain a decidable logic that can 
be used for fully automatically discharging verification conditions in a Hoare- 
style verification proof. Discharging VCs can be automated by calculi that sym- 
bolically execute pre-conditions forward resp. post-conditions backward, and 
then invoking an entailment checker. Symbolic execution calculi typically either 
introduce first-order quantifiers or fresh variables in order to deal with updates 
to the program variables. We leave the extension of SSL to support for quan- 
tifiers for future work and in this paper develop a forward symbolic execution 
calculus based on fresh variables. 

We target the usual Hoare-style setting where a verification engineer anno- 
tates the pre- and post-condition of a function and provides loop invariants. We 
exemplify two annotated functions in Fig. the left function reverses a list 
and the right function copies a list. In addition to the program variables, our 
annotations may contain logical variables (also known as ghost variables); for 
example, the annotations of list reverse only contain program variables, while 
the annotations of list copy also contain the logical variable u (which is assumed 
to be equal to x in the pre-condition] 


Forward Symbolic Execution Rules. We state local proof rules for a simple heap- 
manipulating programming language in Fig.|8] We remark that we do not include 
a rule for the statement x := x.next for ease of exposition; however, this is w.l.o.g. 
because x := x.next can be simulated by the statements y := x.next;x := y at the 
expense of introducing an additional program variable y. Our only non-standard 
choice is the modelling of the malloc statement: we assume a special program 
variable m, which is never referenced by any program statement and only used 


8 m is a special program variable introduced for modelling malloc. 
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in the modelling; the malloc statement updates the value of the variable m 
to the target of the newly allocated memory cell; this modelling justifies the 
proof rule for malloc stated in Fig. |8| For a small-step operational semantics of 
our program statements we refer the reader to the extended version [33]. The 
rules for the program statements in Fig. [8] are local in the sense that they only 
deal with a single pointer or the empty heap. The rules in Fig. [9] are the main 
rules of our forward symbolic execution calculus. The frame rule is essential for 
lifting the local proof rules to larger heaps. The materialization rule ensures that 
the frame rule can be applied whenever the pre-condition of a local proof rule 
can be met. We now give more details. For a sequence of program statements 
c = ĉi -+ -Ck and a pre-condition Pstart the symbolic execution calculus derives 
triples {P;tart} C1 ++; {Qi} for all 1 < i < k. In order to proceed from i to i+1, 
either 1) only the frame rule is applied or 2) the materialization rule is applied 
first followed by an application of the frame rule. The frame rule can be applied 
if the formula Q; has the shape Q; = A * P, where A is suitably chosen and P 
is the pre-condition of the local proof rule for statement c;. Then, Q;+1 is given 
by Qi+ı = A[x’/x] * Q, where x = modifiedVars(c), x’ are fresh copies of the 
variables x and Q is the right hand side of the local proof rule for statement 
ci, i.e., we have {P}c; {Q}. Note that the frame rule requires substituting the 
modified program variables with fresh copies: We set modifiedVars(c) := {a,m} 
for c = malloc(x), modifiedVars(c) := {x} for c = x := y.next and c = x:=y, 
and modifiedVars(c) := 0, otherwise. The materialization rule may be applied in 
order to ensure that Q; has the shape Q; = A * P. This is not needed in case 
P = emp but may be necessary for P = x +> y. We note that Q; guarantees 
that a pointer « is allocated iff Q; $ 7((2 + nil)-@t). Under this condition, the 
rule allows introducing a name z for the target of the pointer x. We remark that 
while backward-symbolic execution calculi typically employ the magic wand, 
our forward calculus makes use of the dual septraction operator; we were able to 
design a general rule that guarantees a predicate of shape Q; = A x P without 
the need of coming up with dedicated rules for, e.g., unfolding list predicates. 


Applying the forward symbolic execution calculus for verification. We now ex- 
plain how the proof rules presented in Fig. |8} and p] can be used for program 
verification. Our goal is to verify that the pre-condition P of a loop-free piece 
of code c (in our case, a sequence of program statements) implies the post- 
condition Q. For this, we apply the symbolic execution calculus and derive a 
triple {P}c{Q’}. It then remains to verify that the final state of the symbolic 
execution Q’ implies the post-condition Q. Here, we face the difficulty that the 
symbolic execution introduces additional variables: Let us assume that all anno- 
tations are over a set of variables x, which includes the program variables and 
the logical variables. Further assume that the symbolic execution {P}c{Q’} in- 
troduced the fresh variables y. With the results of Section [3] we can then verify 
the entailment Q’ uy Q. However, we need to guarantee that all models (s, h) 
of Q with dom(s) = x Uy are also models when we restrict dom(s) to x (note 
that we can think of the variables y as implicitly existentially quantified). In 
order to deal with this issue, we require annotations to be robust: 
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{1s(z, nil) x u = x} % list copy 

malloc(s); 

PISS, 

while(x Æ nil) 

{1s(u, x) * 1s(x, nil) * 1s(r, s) * s > m} 

{ malloc(t); 
% t.data := x.data; not modelled 
s.next := t; 


{1s(x, nil)} % list reverse 
a:= nil; 
while(x Æ nil) 
{1s(z, nil) x 1s(a, nil)} 
{ b:=x.next; 
x.next := a; 


ai= xX; ae: 
— . e ’ 
— a b } y := x.next; 
x i= W; eg } 
{1s(x, nil)} 


s.next := nil; 
{1s(u, nil) * 1s(r, nil)} 


Fig. 10: List reverse (left) and list copy (right) annotated pre- and post-condition 
and loop invariants. 


Definition 11 (Robust Formula). We call a formula p € SL robust, if for 
all models (sı, h) and (s2,h) with fvs(y) C dom(s1) and fvs(y) C dom(s2) and 


81(x) = sə(x) for all x € fus(y), we have that (s1, h) È y iff (s2, h) È y. 


Lemma 23. Let p € SL be a positive formula. Then, p is robust. 


Lemma |4| states that all formulas from the positive fragment are robust. 
In particular, the annotations in Fig. are robust. As an example for a non- 
robust formula consider y in Example|1| We note that Lemma f]|does not cover 
all robust formulas, e.g., t is robust. We leave the identification of further robust 
formulas for future work. 

We now state the soundness of our symbolic execution calculus: 


Lemma 24 (Soundness of Forward Symbolic Execution). Let c be a se- 
quence of program statements, let P be a robust formula, let {P}c{Q} be the 
triple obtained from symbolic execution, and let V be the fresh variables intro- 
duced during symbolic execution. Then, Q is robust and for all stack-heap pairs 
(s,h),(s’,h’) such that (s,h)  P and (s',h’) can be obtained from (s,h) by 


st 


executing c, there is a stack s” with s! C s”, V C dom(s”) and (s",h') EQ. 


Automation. We note that the presented approach can fully-automatically verify 
that the pre-condition of a loop-free piece of code guarantees its post-condition: 
For every program statement, we apply its local proof rule using the frame rule 
(and in addition the materialization rule in case the existence of a pointer target 
must be guaranteed). We then discharge the entailment query using our decision 
procedure from Section [8] We now illustrate this approach on the programs from 
Fig. For both programs we verify that the loop invariant is inductive (in both 
cases the loop-invariant P is propagated forward through the loop body; it is 
then checked that the obtained formula Q again implies the loop invariant P; 
for verifying the implication we apply our decision procedure from Corollary f}: 
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Example 6. Verifying the loop invariant of list reverse: 


{1s(a, nil) x 1s(a, nil)} (=: P) 
assume(x Æ nil) 
{1s(a, nil) x 1s(a, nil) * x Æ nil} 
# materialization 
{x > z-@(1s(a, nil) x 1s(a, nil) x x Æ nil) x a+ z} 
b := x.next 
{x > z-®(1s(x, nil) x 1s(a, nil) x x Æ nil) xx > z xb = z} 
x.next := a 
{x > z-@(1s(a, nil) x 1s(a, nil) x x Æ nil) x x œ> a x b = z} 


ax 
{x z-@(1s(z, nil) x 1s(a', nil) * £ Anil) xv a *b=z*a=a2} 
x:i=b 


{x + z-@(1s(2", nil) x 1s(a’, nil) xv’ Anil) ea’ Ha’ *b=z x 
a= zx *x =b}(=: Q) 
{1s(a, nil) x 1s(a, nil)} (=: P) 


Example 7. Verifying the loop invariant of list copy: 


{1s(u, x) * 1s(a, nil) x 1s(r, s) x s > m} (=: P) 
assume(x Æ nil) 
{1s(u, x) * 1s(x, nil) x 1s(r, s) x s œ> m x x Æ nil} 
malloc(t) 
{1s(u, x£) * 1s(x, nil) x 1s(r, s) x s =œ m’ «a Æ nil x t> m} 
s.next := t 
{1s(u, £) x 1s(x, nil) x 1s(r, s) * sœ t* x Æ nil xt m} 
s:=t 
{1s(u, x) *1s(z,nil) *1s(r,s’)*s' œ t* x Æ nil xt m * s= t} 
# materialization 
{x > z-@(1s(u, £) x 1s(x, nil) x 1s(r, s’) x s > tx x Æ nil x t m * s = t) * 
zre z} 
y := x.next 
{x z-@(1s(u, x) * 1s(x, nil) x 1s(r,s’) x s! > tx x Æ nil xt m* s= t) * 
re zxy =z} 
xi=y 
{x + 2z-@(1s(u, x’) * 1s(2’, nil) x 1s(r, s’) * s Ot x 
ve #énilxtrim«s=t)*xa Ozey=zxv=yl}(= Q) 
{1s(u, x) * 1s(a, nil) x 1s(r, s) x s œ> m} (=: P) 
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While our decision procedure can automatically discharge the entailments 
in both of the above examples, we give a short direct argument for the ben- 
efit of the reader for the entailment check of Example (6 (a direct argument 
could similarly be worked out for Example F): We note that Q’ simplifies to 
Q" = {a œ> x-@(1s(a, nil) x 1s(a’, nil)) x a œ a'}. Every model (s, h) of Q” must 
consist of a pointer a > a’, a list segment 1s(a’, nil) and a heap h’ to which the 
pointer a ++ x can be added in order to obtain the list segment 1s(a, nil); by 
looking at the semantics of the list segment predicate we see that h’ in fact must 
be the list segment 1s(z, nil). Further, the pointer a > a’ can be composed with 
the list segment 1s(qa’, nil) in order to obtain 1s(a, nil). 


5 Normal Forms and the Abduction Problem 


In this section, we discuss how every AMS can be expressed by a formula, which 
in turn makes it possible to compute a normal form for every formula. We then 
discuss how the normal form transformation has applications to the abduction 
problem. 


Normal Form. We lift the abstraction functions from stacks to sets of variables: 
Let x C Var be a finite set of variables and y € SL be a formula with fvs(~) C x. 
We set ax(y) := {as(y) | dom(s) = x} and abst,(y) := ax(y) N AMS/,).x, 
where AMS;,x := {(V,E,p,7) E€ AMS | UV =x and y < k}. (We note that 
ax(y) is computable by the same argument as in the proof of Theorem B) 


Definition 12 (Normal Form). Let NFx(y) := V Aea, (p) AMS2SL'I (A) the 
normal form of p, where AMS2SL™ (A) is defined as in Fig. A 


The definition of AMS2SL™ (A) represents a straightforward encoding of the 
AMS A: aliasing encodes the aliasing between the stack variables as implied 
by V; graph encodes the points-to assertions and lists of length at least two 
corresponding to the edges E; negalloc encodes that the negative chunks R € p 
precisely allocate the variables v € R; garbage ensures that there are either 
exactly y additional non-empty memory chunks that do not allocate any stack 
variable (if y < m) or at least y such chunks (if y = m); negalloc and garbage 
use the formula negchunk which precisely encodes the definition of a negative 
chunk. We have the following result about normal forms: 


Theorem 6. NF,(v) =a p and p = NFx (4). 


The abduction problem. We consider the following relaxation of the entailment 
problem: The abduction problem is to replace the question mark in the entailment 
y * [?] È, Y by a formula such that the entailment becomes true. This problem 
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AMS2SL”'(A) :=aliasing(.A) x graph(A) « negalloc(A) x garbage” (A) 


aliasing(A) := ( * r= v) * ( * max(v) # max(w)) 


vEV,2,yEv v,wEV,vžw 


graph( A) := (o 2 ci max(v) > max(v’)) * 


( y 1es2(max(v), max(v’))) 


E(v)=(v,>2) 
negalloc( A) := x negchunk(A) A VAN alloc(max(v)) A VAN ~alloc(max(v)) 
REp veR veV\R 
garbage(.A, y) ify<m 


garbage(.A, m-1) x -emp Aey ~alloc(max(v)) otherwise 


garbage” (A) := 


if k = 
garbage( A, k a ! : 


garbage(A, k-1) * negchunk(A) A Avey ~alloc(max(v)) otherwise 


negchunk(A) :=7emp ^ 7=(-emp * s~emp)/ 
ae 
v,weV,y~e{max(v)++max(w),1s(max(v),max(w)) } 
alloc(x) :=7((x > nil)-@t) 
1s>2(x,y) :=1s(x, y) A =(z > y) 


Fig. 11: The induced formula AMS2SL™ (A) of AMS A = (V, E, p, y) with y < m. 


is central for obtaining a scalable program analyzer as discussed in [10] gi The 
abduction problem does in general not have a unique solution. Following [10], 
we thus consider optimization versions of the abduction problem, looking for 
logically weakest and spatially minimal solutions: 


Definition 13. Let y,w E SL and x C Var be a finite set of variables. A 
formula Ç is the weakest solution to the abduction problem ọ * [?| F, w if it 
holds for all abduction solutions ¢' that € =e C. An abduction solution is Ç 


minimal, if there is no abduction solution ¢' with C = ¢’ x (semp). 


Lemma 25. Let p, be formulas and let x C Var be a finite set of variables. 
Then, 1) the weakest solution to the abduction problem  « [?] È. yw is given by 
the formula p-*y, and the 2) weakest minimal solution is given by the formula 


ph A n(( p=) * semp). 


? While the program analyzer proposed in [10] employs bi-abductive reasoning, the bi- 
abduction procedure in fact proceeds in two separate abduction and frame-inference 
steps, where the main technical challenge is the abduction step, as frame inference 
can be incorporated into entailment checking. We believe that the situation for SSL 
is similar, i.e., solving abduction is the key to implementing a bi-abductive prover 
for SSL; hence, our focus on the abduction problem. 
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We now explain how the normal form has applications to the abduction 
problem. According to Lemma [25] the best solutions to the abduction problem 
are given by the formulas Ç := y-*w and Ç’ := px) Aa((y-*w) * nemp). While 
it is great that the existence of these solutions is guaranteed, we a-priori do not 
have a means to compute an explicit representation of these solutions nor to 
further analyze their structure. However, the normal form operator allows us to 
obtain the explicit representations NF,.(¢) and NF,(¢’). We believe that using 
these explicit representations in a program analyzer or studying their properties 
is an interesting topic for further research. Here, we establish one concrete result 
on solutions to the abduction problem based on normal forms: 


We can compute the weakest resp. the weakest minimal solution to the abduc- 
tion problem from the positive fragment. Observe that among the sub-formulas of 
aliasing and graph, only the formula 1s>» is negative. To be able to use 1s>2(x, y) 
in a positive formula, we first need to add a new spatial atom 1s>2(a,y) to SSL 
with the following semantics: 1s>2(z,y) holds in a model iff the model is a list 
segment of length at least 2 from « to y. (The whole development in Sections [2] 
and |3| can be extended by this predicate.) We can then simplify the formula 
graph(A) in AMS2SL” (A) by directly translating edges E(v) = (v’, >2) to the 
atom 1s>2(max(v), max(v’)). Then, Vy 4,4) €ax(¢) with p=0,7=0 AMS2SL!*! (A) 
for C = y-xw or Ç = p-xw A 7((y-*v) x nemp) is the weakest resp. the weakest 
minimal solution to the abduction problem from the positive fragment. 


6 Conclusion 


We have shown that the satisfiability problem for “strong” separation logic with 
lists is in the same complexity class as the satisfiability problem for standard 
“weak” separation logic without any data structures: PSPACE-complete. This is 
in stark contrast to the undecidability result for standard (weak) SL semantics, 
as shown in [16]. 


We have demonstrated the potential of SSL for program verification: 1) We 
have provided symbolic execution rules that, in conjunction with our result on 
the decidability of entailment, can be used for fully-automatically discharging 
verification conditions. 2) We have discussed how to compute explicit represen- 
tations to optimal solutions of the abduction problem. This constitutes the first 
work that addresses the abduction problem for a separation logic closed under 
Boolean operators and the magic wand. 


We consider our results just the first steps in examining strong-separation 
logic, motivated by the desire to circumvent the undecidability result of [I6]. Fu- 
ture work is concerned with the practical evaluation of our decision procedures, 
with extending the symbolic execution calculus to a full Hoare logic as well as 
extending the results of this paper to richer separation logics (SL) such as SL 
with nested data structures or SL with limited support for arithmetic reasoning. 
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